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PREFACE 


There are two quite distinct aspects or levels of mathematical 
statistics. The one involves elementary mathematics and the 
methodologies serve descriptive purposes. These fundamentals are 
set forth in Part I. The other aspect is essentially mathematical in 
character and the methodologies are developed for inferential pur- 
poses. It cannot be made elementary by its very nature because 
the problems are so difficult that powerful mathematical tools are 
necessary to provide solutions of the problems. 

In recent years great advances have been made in statistical 
theory. Methods of formulating and testing hypotheses have been 
systematically developed and a sound basis for statistical inference 
has replaced older methods involving the intuitive notions of “ prob- 
able error.” In this book I have elected to include some of the 
classical theory and some of the simpler concepts and techniques of 
the modern theory. In short, I have made a sustained effort to write 
an up-to-date text which will serve to prepare the student for the 
really mathematical part of the theory of statistics. A knowledge 
of elementary probability theory, calculus, and determinants is pre- 
supposed. It is also understood that the student is familiar with the 
rudiments of statistics such as are given in Part I. However, if no 
preliminary course in statistics has been studied, mature students 
should be able to acquire the essential definitions and concepts in a 
rapid survey of Part I. 

Of the books which have been particularly useful in preparing the 
manuscript, I would name the following: Campus Mathematical 
StatisticSj Fisher^s Statistical Methods For Research WorkerSj Fry's 
Probability And Its Engineering Uses, Rietz's Mathematical Statistics, 
and Wilks' Statistical Inference. I have also derived much help from 
certain papers in the literature by Professors Carver, Jackson, Rider, 
and Rietz. Specific reference to these papers is made in the text. 
A reference list of pertinent books and papers is given at the end of 
each of the last three chapters. It is recommended that some of 
these be available to the student for supplementary study in connec- 
tion with this text. 
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CHAPTER I 

PROBABILITY AND ITS RELATION TO STATISTICAL THEORY. THE 
BERNOULLI DISTRIBUTION. APPROXIMATIONS BY MEANS OF THE 
NORMAL CURVE AND POISSON EXPONENTUL FUNCTION 

1. Importance. The subject of probability deals with one of the 
most interesting branches of modern mathematics and is becoming 
conspicuous for its applications in many fields of learning. This sub- 
ject is of fundamental importance, not only in the theory of insur- 
ance and statistics, but also in various branches of the biological and 
physical sciences. The following quotations from contemporary 
writers indicate the importance of probability theory in the philosophy 
of modern science. 

It was, I think, Huxley who said that six monkeys, set to strum unintelligently 
on typewriters for millions of millions of years, would be bound in time to write 
all the books in the British Museum. If we examined the last page which a 
particular monkey had typed, and found that it had chanced, in its blind strum- 
ming, to type a Shakespeare sonnet, we should rightly regard the occurrence as a 
remarkable accident, but if we looked through all the millions of pages the mon- 
keys had turned off in untold millions of years, we might be sure of finding a 
Shakespeare sonnet somewhere amongst them, the product of the blind play of 
chance. . . . 

These and other considerations have led many physicists to suppose that there 
is no determinism in events in which atoms and electrons are involved singly, and 
that the apparent determinism in large-scale events is only of a statistical nature. 
When we are dealing with atoms and electrons in crowds, the mathematical law 
of averages imposes the determinism which physical laws fail to provide. . . . 
We can only speak in terms of probabilities. 

— The Mysterious Universe^ Sir James Jeans. 

In order to understand the nature of knowledge about social and economic life, 
it is necessary to know something about the theory of probability; because 
knowledge in these fields, in general, is essentially indeterminate knowledge. 
There are two fimdamental ideas which need to be grasped in order to understand 
the social sciences. The first idea is that all science is philosophical. , . . The 

1 
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time honored aim of philosophy has been to discover and interpret (to the extent 
possible to the human mind) the characteristics of nature. By nature is meant 
all things, material and psychic, external to man, and man himself. In many 
fields the minds of men have penetrated into the mysteries of nature and have 
produced knowledge concerning them. In the physical aspects (both external 
to man and in man) great progress has been made towards the attainment of 
apparently precise knowledge, within certain definite limits; while in the field 
of the psychic the progress has been towards increasing the probabilities of truth 
of a great variety of hypotheses. But it is characteristic of the psychic aspects 
of knowledge that the facts in those fields are indeterminate, not precise, and 
apparently dynamic. Even in the physical and chemical world, the discoveries 
of recent years have emphasized a great realm of indeterminacy, particularly 
when confronting great velocities and infinitely small particles within the atom. 
Thus the second idea to grasp is that in all fields of knowledge, even the physical, 
beyond the limited range of relatively precise knowledge accumulated by man, 
there is a vast frontier of speculation. It has been the function of scientific 
method — the new tool of philosophy — to penetrate ever deeper into this realm 
of speculative knowledge. Primarily this has been made possible by the develop- 
ment of the theory of probabilities. 

— Elementary Statistics^ James G. Smith. 

There exist in nature systems of chance causes which operate in such a way 
that the effects of these causes can be predicted — by making use of customary 
probability theory in which objective probabilities in the limiting statistical 
sense are substituted for the mathematical probabilities. 

— Economic Control of Quality of Manufactured Product , W, A. Shewhart. 

It appears likely that the further development of the theory of probability in 
the next few decades may turn out to be a major chapter in the history of science. 

— Science^ January 18, 1929. 

The great extension in the use of statistics in the last two decades has been 
associated with and largely made possible by mathematical developments based 
upon the theory of probability. 

— Harold Hotelling, Journal American Statistical Association^ March 

Supplement, 1931 

2. Definitions. Inasmuch as the subject of probability plays an 
important role in certain phases of statistical theory, we will now 
consider some of the fundamental principles of this subject. It will be 
convenient to divide the subject into two classes, and speak of a 
priori and empirical probability. 

(a) A priori probability. If all the ways of obtaining successes and 
failures can be analyzed into s possible mutually exclusive ways each 
of which is equally likely, and if x of these ways give successes, the 
probability of success in a single trial is x/s, 

A pnon probability is concerned with that class of problems in 
which a full knowledge of the conditions affecting the event in ques- 
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tion is known beforehand. In other words, the problem may be set up 
and solved abstractly. Thus the following problems are questions of 
a priori probability: A box contains 4 white and 6 red billiard balls. 
What is the probability that two drawn will be of the same color? 
A coin is to be tossed 7 times. What is the probability that heads 
will turn up at least 3 times? A sample of telephone receivers is 
to be taken from a case containing 100 telephone receivers of which 
20 are known to be defective. What is the probability that the sam- 
ple will contain exactly 2 defectives? 

There is another class of events in which it is impossible or im- 
practical to enumerate all of the equally likely ways in which the 
event in question may succeed or fail. When this is the case it is 
necessary to estimate the probability by trial and observation. Thus 
we have 

(6) Empirical probability. If it is observed that an event has 
occurred x times among 5 trials the ratio x/s is called the relative 
frequency of success. The limit* of the ratio a;/s as s is taken 
indefinitely large is called the probability of success in a single trial. 
In symbols we have 

lim - = p. 

$—*•00 S 

In statistical applications the limit of x/$ cannot in general be deter- 
mined, but an observed relative frequency (s large) often provides 
a valuable estimate of the underl 3 ring probability assumed in the 
definition. For example, according to the American Experience 

*The student familiar with the theory of limits will realize that a rigorous 
proof that a probability p exists as the limit of x/s as s increases would require 
us to show that, Given an e > 0, then there exists a number N such that 

” — p <6 for all s ^ iV'. 
s 

It is of course obvious that we cannot prove the existence of this limit because 
we cannot be sure that the difference |a;/s “ p\ will become and remain, as s 
increases, less than any assigned positive number, no matter how small. For 
example, after throwing a coin 10,000 times it is possible to get a run of all heads 
in the next 1000 throws. In this connection Eietz says: That the limit exists 
is an empirical assumption whose validity cannot be proved, but experience with 
data in many fields has given much support to the . . . usefulness of the assump- 
tion.’^ {Mathematical Statistics, p. 8.) 

We can, however, prove that the probability approaches certainty that x/s 
will approach p as a limit as s is indefinitely increased. (See § 7.) 



4 


Mathematics of Statistics 


Mortality Table, out of 57,917 persons living aged 60, there are 1546 
who die during the following year. Therefore, the relative frequency 
I 1546/57,917 = .026693 is taken by insurance companies as the 

i probability that a person aged 60 will not survive another year. 

I 3. Theorems. We will now review from algebra certain elemen- 

I ; tary formulas and theorems leading to the use of probability theory 

I in statistical problems. We will begin with the subject of permuta- 

I tions and combinations. 

I A 'permutation is an arrangement of all or part of a set of things. 

|| A combination is a group of all or part of a set of things. A different 

I permutation may be obtained by changing either the items or their 

I order but a different combination may be obtained only by changing 

one or more of the items in the group. 

|i Theorem I. The number of permutations of n different things taken 

f r at a time is denoted by the symbol P(?i, r) and given by 

j P(n, r) = n{n — l)(n — 2) * • • (n — r + 1). 

I Corollary. If the n items are not all different, there being ni of 

I type Ti, 712 of type T 2 j • • -yUk of type Tk, then the number of distinct 

I permutations of the n items taken n at a time is 

[ ^ * 

ni\n%\ - • • nk\ 

[ h 

where X^7i< = n. The symbol n !, read factorial n,” is defined by 
n ! = n(n — l)(n — 2) • • • 3 • 2 • 1. 

j Theorem II. The number of combinations of n different things 

I taken rata time is denoted by C {n, r) and given by 

j r I r ! (n — r) ! 

It will be understood that C(n, r) equals zero when r > n and equals 
one when r — n, 

f Theorem ni. The Mat number of combinations of n different things 

taken 1, 2, • • *, or n at a time is 2^ 1. 

Proof. The formula for C{n, r) is the coefficient of the (r + l)st 
term in the binomial expansion {x + y)^. Thus, 

(x + yy = + C(n, l)x^-^^y + C(n, 2)x^‘-Y 

+ • • ' + C{n, r)x^'^y^ + * • • + y^. 


\ 
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If we let x = ^ = 1, this becomes 

2- -1^ C{n, 1) + C(n, 2) + • • • + C(n, r) 

+ • * * + C(n, n). 

Events of a set are said to be mutually exclusive if the occurrence 
of any one of them on a particular occasion excludes the occurrence 
of any other on that occasion. They are said to be independent or 
dependent according as the occurrence of any one of them does not 
or does affect the occurrence of others in the set. 

If p is the probability that an event will happen in a single trial 
and g is the probability that the event will fail (to happen) in a single 
trial, then p + g = 1 and unity is the symbol for certainty. 

Theorem IV. The probability that one or other of a set of mutually 
exclusive events should happen when all of them are in question is the 
sum of the probabilities for the separate events. 

Theorem V. The probability that all of a set of independent events 
will happen on a given occasion when all of them are in question is the 
product of the probabilities for the separate events. 

Theorem VI. Suppose the events are dependent. Let pi be the 
probability for the happening of a first event Ei and p^ be the probability 
for the occurrence of a second event E 2 after E\ has happened. Then the 
probability that both events will happen in the order named is p\P 2 . 
The procedure may be extended in an obvious manner to any finite 
number of events. 

4. Supplementary Reading. It is suggested that the student look 
up the proofs of the above theorems in any college algebra text and 
review the discussions presented there. 

For the more advanced student the following references are recom- 
mended. Some of the early chapters of the books may also be read 
with profit by the beginning student. 

Books : 

The Mathematical Theory of Probabilities — Ame Fisher. 

Probability — Coolidge. 

Mathematical Statistics — Rietz. 

Choice and Chance — Whitworth. 

Prohahility and Its Engineering Uses — Fry. 

Elements of Probability — Levy and Roth. 

Introduction to Mathematical Probability — Uspensky. 

Papers : 

Fundamental Concepts in the Theory of Probability — Fry, American Mathe- 
matical Monthly, voL 41, 1934, p. 207. 

On the Foundations of the Theory of Probability — Struik, Philosophy of Science, 
yol. 1, no. 1, January, 1934. 
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Problems 


, Prove both algebraically and verbally that 

(a) iP(n, r) = C(w, r)P(r, r), (b) C(n, r) = C{n, n - r). 

, From among nine men A, B, C, Z), j&, F, 6^^, iZ, /, a committee of 
four men will be chosen. The nine names will be written on nine 
separate cards and four cards drawn at random one at a time from 
a box. 

(а) In how many different ways may the four cards come out? 
Ans. 3024. 

(б) How many different comnoittees are possible not including the man ^4? 
Ans. 70. 

. Consider the word “ introduce.^’ 

(а) In how many of the possible arrangements of all its letters will there be 
a consonant in the first place? Ans. 201,600. 

(б) From its letters how many four letter permutations consisting of three 
vowels and one consonant can be formed? Ans. 480. 

(c) If five of its letters are selected at random what is the probability that 
two are vowels and three are consonants? 10/21. 

. On a table there are four different biographies with brown backs and seven 
different novels with red backs. 

(a) If all of the books are placed upright in a row on a shelf, in how many 
different ways may they be arranged so that the orders of the colors 
are different? Ans. 330. 

(b) In how many different ways may two of the biographies and three of 
the novels be selected and arranged on the shelf so that the orders of the 
books are different? Atis. 25,200. 

. In a box there are five red billiard balls with the numbers 1, 2, 3, 4, 5, painted 
on them (one on each ball), and three white billiard balls with the numbers 
1, 2, 3, similarly painted on them. From the box a man draws two balls at 
random. 

(а) What is the probability that one of the balls drawn is white and the other 
is red? Ans. 15/28. 

(б) What is the probability that the two balls drawn have either the same 
color or the same number? Ans. 4/7. 

• A bag contains four white, five red, and six black balls. Three are drawn 
at random. Find the probability that (a) no ball drawn is black, (6) 
exactly two are black, (c) all are of the same color. 

, An urn contains four white and five black balls. Three balls are drawn at 
random and replaced by green balls. If then two balls are drawn at ran-' 
dom, what is the probability that they are both of the same color? Ans. 
29/108. 

Write out the expressions for C(n — 1, 2); C(n — 1, 3); C(s, x), 

. (a) Show that 


C(s —1, X — 1) 


(s - l) (s 2) . > » (g - X + 1) 

(a: - 1) ! 


(g - 1) ! 

(x - 1) ! (s - x) f 


(6) What is the value of the above expression when x = 1? 
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10. Write in expanded form : 

(а) ZC(s, x). 

JC sss 0 

( б ) £ 0(8 - 1, 1 - 1 ). 

a:=l 

11. Twelve cards have been dealt, six down, and the other six showing a jack, 

two kings, a seven, a five, and a four. What is the probability that the 
next card will be a four or less? {National Mathematics Magazine^ vol. 
XIII, no. 2, p. 94.) 

12. From an um containing ten balls, numbered from one to ten, balls are drawn, 
' one by one and placed in a row of holes, numbered from one to ten, each 

ball being placed in the proper hole. What is the probability that there 
will not be an empty hole between two filled ones at any time of the 
drawing? {American Mathematical Monihhjy vol. 45, no. 9, p. 635.) 
Arm. 2/14,175. 

6. Repeated Trials. We now consider a theorem which is very 
important both in the theory of probability and its applications in 
statistics. 

Theorem VII. Let p be the probability that an event will happen 
in a single trial, and q — 1 — p the probability that the event will fail 
in a single trial. Then the probability P that the event will happen 
exactly x times in s trials, during which p remains constant, is given by 
the (x + l)st term of the binomial expansion: 

(1) (g + pY = g® + C{s, l)pg®-^ + C{s, . 

+ C{s, + • • • + p®. 

Proof. By Theorem V, the probability that the event will happen 
X times and fail the other s — x times in any specified order is 
But the number of ways in which the order may be specified 
is C{s, x) or C(s, s “ x). These ways are equally likely and mutually 
exclusive. Therefore, by Theorem IV the required probability is 
C(s, x)p^q^~^. We recognize this expression as the {x + l)si5 term 
of the binomial expansion of (g + p)®. 

Corollary 1. The probability that the event will happen at most 
X times in s trials is the sum of all those terms of (1) in which the ex-- 
ponent of p is equal to or less than x. 

Corollary 2. The probability that the event will happen at least 
X times in s trials is the sum of all those terms of (1) in which the ex- 
ponent of p is equal to or greater than x. 

Proofs. By Theorem IV, the probability that the event will happen 
at most X times is the sum of the probabilities that it will happen 
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0, 1, 2, 3, • • x times. Similarly, the probability that the event will 
happen at least x times is the sum of the probabilities that it will 
happen s, s — 1, s — 2, • • * , x times. 


Problems 


1. A dust storm contains particles of two kinds identical except as to color, 

brown and yellow particles existing in the ratio 3:2. If five particles of 
this dust enter my eye at random determine the probability that two of 
them are brown and the other three are yellow. (See American Mathe- 
matical Monthly, vol. 41, no. 5, May 1934.) 

2. Six coins are tossed once, or what amounts to the same thing, one coin is 

tossed six times. Find the probability of obtaining heads 
{a) exactly three times 
(6) at most three times 

(c) at least three times 

(d) at least once. 

3 . {a) What is the probability of throwing seven in a single toss of two dice? 
(6) In six tosses of two dice find the probability of throwing seven at least once. 

4 . Toss six coins 64 times and record the number of times heads appear 0, 1, 2, 

3, 4, 5, 6 times. (Instead of tosses, the coins may be shaken in a box.) 
Compare the resulting distribution of frequencies with the terms of the 
expansion of 64(i i)®. 

6. A bag contains white and black balls in the proportion 2:3. Let the 
probability of drawing a white ball be called a success. Three balls are 
drawn separately and after each drawing the ball is returned to the bag 
and thoroughly mixed with the others so that the fundamental probability 
of success remains constant during the trials. Find the probabilities of 
0, 1, 2, 3 successes. If this experiment were repeated 125 times what is 
the theoretical frequency of each of the possible number of successes? 

6. Show that equation (1) may be written: 


7 . Show that 


(q + vY^ E 


b a: ! (s — a;) ! 


pzqs' 


(s ~ 1) ! 


aS'i (x ~ 1) ! (s - a:) ! 


px 


(q + 2 >)-~^ - 1. 


8. (a) Find the values of C(18, x) for a; = 0 to x = 18 inclusive. (To the in- 
structor: Pascal’s Triangle provides a simple scheme for constructing a 
table of binomial coefficients.) 

(b) Evaluate 2®/3^^ for x = 0 to x == 18 inclusive. 

(c) Show that (| 4-1)^® may be written 

18 

E f{x) where f(x) - C(18, x)2V3“. 
a; =0 

(d) Using the results of (a) and (6), find the values of f(x) for x = 0 to 
X == 18. Save your results for future reference. 
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6. Relative Frequencies from Dichotomous Samples. Suppose a 
sample of s individuals from the same population is divided into two 
groups according as they have a certain attribute or not. Such a divi- 
sion is said to be dichotomous. Out of s individuals we find that x 
individuals have the attribute in question and s — x do not, it being 
possible for x to take any integral value from 0 to s inclusive. The 
attribute in question is frequently called the ‘‘ event ” and its occur- 
rence is called a success.’^ The ratio x/s is called the relative 
frequency of success. 

Many illustrations of relative frequency come readily to mind. 
Out of 100 throws of a coin we may have noted 45 heads. From 
a group of school children, taken at random, we may find 55 
boys. Or again, we might make a certain disease of children the 
basis of a dichotomy. Out of 100 fifth grade school children we 
may find that 27/100 is the relative frequency of the occurrence of 
measles. 

7. Theorem of Bernoulli. The theorem of Bernoulli describes 
the approach of the relative frequency x/s to the underlying con- 
stant probability p as s increases. The theorem may be stated as 
follows: 

Theorem VUI. In a set of s trials in which the chance of success in 
each trial is a constant p, the probability P approaches unity that the 
relative frequency x/s will approach p as a limit as s increases indefi- 
nitely.'^ 

Observe that this is a weaker statement than saying that p is the 
limit of x/s as the number of trials increases indefinitely. Another 
way of stating the theorem is as follows: The probability Q = 1 — P 
of the difference {x/s — p) being numerically as large as any assigned 
positive number € will approach zero as a limit as s increases indefi- 
nitely. 

The theorem is the basis for our definition of empirical probability. 
It is often regarded as a fundamental theorem of mathematical 
statistics because of the common use of x/s (s large) as a close approxi- 
mation to the probability p. 

8. Binomial Description of Frequency. The terms of (g + p)* 
are the theoretical relative frequencies for a dichotomous situation. 
If we take N sets of s trials the theoretical absolute frequencies are 
given by the terms of N(q + p)^ when N is chosen so that these terms 
are integers. It follows that N is merely a proportionality factor, 

* A proof is given in Chapter Vb § 10. 
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Hence we may say that if in a single trial the probability of an event 
occurring is p and the probability of its not occurring is q, then if 
a sample of s trials is taken, the frequencies with which the event 
occurs 0, 1, 2, 3, • • •, s times are proportional to the terms of the 
point binomial {q + p)\ This was the first theoretical distribution 
to be established and a discussion of it is given in Ars Conjectandi by 
J. Bernoulli which was published posthumously in 1713. A distribu- 
tion of discrete variates with frequencies proportional to the terms of 
(1) is frequently referred to as a Bernoulli distribution. 

In the Cams Monograph on Mathematical Statistics (p. 23) Pro- 
fessor Rietz explains the applications and limitations of (1) in practical 
statistics as follows: 

Such a distribution . . . serves as a norm for the distributions of relative frequen- 
cies obtained from some of the simplest sampling operations in applied statistics. 
For example, the geneticist may regard the Bernoulli distribution (1) as the 
theoretical distribution of the relative frequencies x/s ol green peas which he 
would obtain among random samples each consisting of a yield of s peas. The 
biologist may regard (1) as the theoretical distribution of the relative frequencies 
of male births in samples of s births. The actuary may regard (1) as the theo- 
retical distribution of yearly death rates in samples of s men of equal ages, say 
of age 30, drawn from a carefully described class of men. In this case we specify 
that the samples shall be taken from a carefully described class of men because 
the underlying assumptions involved in fl) do not permit a careless selection of 
data. Thus, it would not be in accord with the assumptions to take some of 
the samples from a group of teachers with a relatively low rate of mortality and 
others from a group of anthracite coal miners with a relatively high rate of 
mortality. ... 

The expression simple sampling is sometimes applied to drawing a random 
sample when the conditions for repetition just described are fulfilled. In other 
words, simple sampling implies that we may assume the underlying probability 
p of formxila (1) remains constant from sample to sample, and that the drawings 
are mutually independent in the sense that the results of drawings do not depend 
in any significant manner on what has happened in previous drawings. 

9. Graphical Representation. A binomial distribution may be 
represented graphically by a histogram. This is accomplished by 
constructing rectangles centered at x = 0, 1, 2, • • • , s with heights 
proportional to the terms of the binomial. The different “ successes 
denoted by x are the variates, and the corresponding terms of the 
binomial are the theoretical relative frequencies. 

Since the values of x constitute a discrete series it might seem more 
logical to represent the relative frequencies by ordinates instead of 
rectangles. However, since the base of each rectangle is. unity the 
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number rejjresenting its height is also its area, and the representation 
by areas will be useful in our work. In a case like this the frequencies 
are said to be loaded ” on the 
ordinates at the mid-points of the 
class intervals. 

If we are thinking of relative fre- 
quencies or probabilities the sum of 
aU the rectangles is unity, whereas 
if we are thinking of absolute fre- 
quencies the total area of the histo- 
gram is N, Thus if six coins are 
tossed 64 times the theoretical ab- 
solute frequencies are given by the 
terms of 64(J + |)®. These are 1, 6, 

15, 20, 15, 6, 1 and their sum is 64. 


0 1 2 3 4 5 6 

Fig. 1, Histogkam of (| -f J)® 

10. The Mean and Standard Deviation, We have shown that 
the terms of N(q + give the expected frequency of success (with 
respect to an attribute or character) in drawing N samples of s items 
in each sample, where p is the probability of a success. We now 
propose to characterize the distribution of expected frequencies by 
finding the usual moments. In this procedure we may consider the 
relative frequencies given by the terms of (g + because the ab- 
solute frequencies are proportional to these terms, N being the pro- 
portionality factor. It will be convenient to evaluate first the v^s, 
taking the position of the first term as origin. 

By definition 


X = Vi = 




where x refers to the number of successes and f{x) refers to the 
corresponding probabilities which are of course the theoretical rela- 
tive frequencies. Table 1 shows the appropriate frequency table. 
It is obvious that the sum of the second column is unity. To sum 
the third column we factor out sp, obtaining 


+ (a — l)pg"“2 + C(s — 1, 

+ • • • + C(s - 1, x - + • * • + 


which may be written sp{q + p)^~^ = sp. Hence, we have that the 
mean number of successes in s trials is x — sp, where p is the probability 
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Table 1 


X 

/ 


0 

r 

0 

1 

spq^~^ 

spq^-i 

2 

s{s - 1) , , 

2 , P 9‘‘ 

sis — l)p2gr«-2 


s(s - l)(s - 2) 

s(s-l)(s-2) , 

6 

2 1 

2 , PY 


5 (s - 1) • ‘ • (s — rc -j- 1) 

s(s - 1) ■■■ is - X + 1) . _ _ 

X 

X ! 

(X - 1) ! ^ ® 

s 

pa 

sp^ 

Totals 

E/(x) = (s + p)' = 1 

^^xfix) = spiq + = sp 


of success in any trial. This result is often called the mathematical 
expectation ’’ or the “ expected value of a;. 

Table 1 assists our intuitions but logically it is unnecessary. 
We could have proceeded as follows: 


Vl 


= E 


s ! 


qX ! {s - x) \ 


pxqa-x^ ^ ^ 


s ! 


oX ! (s — a;) ! 


pxq8‘ 


We observe that the divisor is unity and in the dividend we can 
divide x into x I So, 


Vl 


= E 


s ! 


(a; - 1) ! (s - x) ! 




Factoring out sp, we have 


(s - 1) ! 


>'l = SpE> IX,/ X 
1 (a: - 1) ! (s - x) 


pi-ig. 


= sp{q + p)’>-^ 


whence we obtain 

( 2 ) 


X = sp. 
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We will use this procedure in finding the higher moments. Since 

S 

0 

we may omit it from the denominators of the rest of the By 
definition then 

J '2 = X) “ 77 — ^ — Ti 
0 a: ! (s — a;) ! 

Writing = x (rr — 1 ) + x, we have 

^ ^ s ^ 

‘'2 ' \ - 1) +E ' ., p%’ 

0 a; ! (s — ®) ! 0 a: ! (s — a;) ! 

This simplifies into 


^x. 


so that we obtain 


s(s — l)p\q + + sp 


V% = s(s — l)p2 + sp. 


In order to get a we must know the second moment about sp. 
From the relation jU2 = ^2 we easily find that 


p% = spg 

whence 

( 3 ) a = {sMY\ 

Example 1. Find the mean and standard deviation of the binomial (| + 1)® 
by means of formulas (2) and (3). Verify your results by the usual procedure 
for computing moments of a frequency distribution. 

Solution, Here p = I, g = f, s = 5. By formulas (2) and (3), 5=3, 
= 1.095. 

Verification, (i + 1)^ = 1 [32 + 240 + 720 + 1080 + 810 + 243]. 

5r 

In finding the moments we may omit the proportionality factor 1/5®. 


X 

/ 

u 


0 

32 

~3 

We find £/ = 3125 

1 

240 

-2 

E“/ = 0 

2 

720 

-1 

= 3750. 

3 

1080 

0 

Hence 

4 

810 

1 

u = 0, 5 = Xo + cw = 3, 

5 

243 

2 

p2 — 1*2, (Ta = (Tm = \/1.2 = 1.095. 
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11. Skewness and Kurtosis. We shall now derive expressions 
for the third and fourth moments. By definition 




=E 


5 I 


- ^ ^ ^ ' 


1(S - X) ! 

Writing x^ = x(x — l)(x — 2) + 3x^ — 2Xj we have 
^ si 


Z^oX ! (s - x) ! 


pXqS^x^ (a; — l)(a; — 2) 




S ! 




qx l(s ~ x) ! 
s ! 


ox ! (s — x) ! 


pxq8-x^2 


pxqs^xx 


s fs — 3 n 

+ 3[s(s — l)p^ + sp] — 2 sp 
= 5(s — l){s — 2)p^ + 3s(s — l)p2 + sp. 


Similarly, by definition 


t'4 


= £ 


s ! 


L T7 

■^qX ! (s - x) ! 


Writing x* = x(x — l)(x — 2)(x — 3) + 6x® — llx* + 6x and pro- 
ceeding in a way analogous to that for evaluating vz we obtain 

Vi = s(s — l)(s — 2)(s — 3)p^ -t- 6;'s — 11^2 -f- 6^1. 


Next we desire the moments about the mean, so that we may obtain 
expressions for skewness and kurtosis. From the relations 

M3 = J'S ~ 3l'2l'i + 2('i® 

M4 = Vi — ^VzVx -t- 6 j'2J'i^ — 

we obtain the quite simple results 

M3 = spgC? - v) 

M4 = mll + 3(s - 2)pg]. 
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Recalling that ar = we have finally that 


(g ~~ V) 
Vspq 



spq s 


3. 


We observe that none of these moments are subject to Sheppard's 
corrections because the assumption that all the values are concen- 
trated at the mid-point of an interval is actually true in the case of 
a binomial distribution. This is obvious graphically since each 
frequency is represented by the middle of a rectangle. 

12. A Recursion Formula. The moments ijljc of a Bernoulli distri- 
bution can be obtained in an elegant manner by means of the recur- 
sion formula 


(4) 


A^H-i — PQ 


sk/ik-i — 


duk 

dq _ 


We know that mo = 1 and ni = 0, so the formula is to be used for 
fc >: 1. Thus for 

= 1, = pq{s/io — 0) 

= m- 

k = 2 , M 3 = P9[0 - (s - 2 s9)] 

= sP9(2g - 1) 

= miq - v)- 

fc = 3, M4 = + s — esg + Gsg®] 

= spg'[l + Sspg — Qpq] 

= spgLl + 3(s — 2)pq]. 


A simple proof of this formula has been given by A. T. Craig in the 
Bulletin of the American Mathematical Society, vol. 40, pp. 262-264. 

To summarize, we have the important characterizing functions of 
a Bernoulli distribution: 


Mean: 

Variance: 

Skewness: 

Kurtosis: 

Excess: 


X = sp 
0 ^ = spq 

03 = (? — P)/<y 

04 = 1/a^ - 6 /s + 3 
^4 — 3 = (1 — 6pq)/spq. 
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13. Mathematical Expectation. If a variable x may assume any 
one of a coimtable set of mutually exclusive values Xi, x^, • ■ Xn, 
in such a way that which we take to be single-valued and non- 
negative, is the probability that x takes the value Xi and such that 

n 

2 /(*<) = 1) X is called a chance variable and/(x) is defined as 
1 

the probability function of the discrete variable x. If the mutually 
exclusive values are 0, 1, 2, 3, • • s, an example of such a law of 
probability is 

fix) = C(s, x)p^q^-'\ 

A frequency distribution whose relative frequencies are' given in 
accord with this law of probability is styled a Bernoulli distribution, 
as we have already observed. 

Let the discrete variable x be subject to the law of probability 
fix) and let gix) be any function of x. The mathematical expectation 
of gix)j denoted by application of the operator jE, is then defined to be 

^3{3^)] = Jl9{xi)fixO. 

2=1 

In particular, if g(x) = x then 

■E(a:) == J^Xifixi) 

2=1 
sz X 

is the first moment, per unit frequency, about the origin. More 
generally, if gix) = x^, (fc = 1, 2, • • •), then 

E(3^) = '^x.^fixi) = y* 

is the fcth moment about the origin. If fix) = C(s, x)p^q‘-^ and 
gix) = a^, then 

(5) Eix^) — ^x*’C(s, x)p‘’q’-^ 

k=0 

defines the moments, about the. origin, of a Bernoulli distribution. 
In particular for fc = 1, we have 

(6) Eix) = sp. 

If gix) - ix - sp)’‘ and fix) = Cis, x)p=‘q‘-^, then 

8 

E[ix — sp)’°] = 23(x — sp)*’C(s, x)p=’g*“® 
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is the fcth moment about the mean. For k = Ij we see that E{x — sp) 
= 0, and for & = 2, we have 

<7*2 = E{x — sp)2 
= E{x^) — {spY 
which we have seen reduces to 
(7) E{x- spy = spq. 

Equations (6) and (7) give the mean and variance with respect to 
the number of successes x in s trials. In some statistical investiga- 
tions the data are expressed in terms of percentages or rates. When 
we may assume a constant probability underlying the frequency 
ratios obtained from observations we have a binomial distribution 
as before but on a different scale. Instead of the variable being x 
it is now x/$. In this case we have • 



For the analogous concept relating to the variance we have 

Therefore, we see from (6) and (7) that the number of successes per 
set of s trials is distributed about an expected value of sp with a 
standard deviation of {spqY^^. From (8) and (9) we see that the 
percentage of successes in a set of s trials is distributed about an ex- 
pected value of p with a standard deviation of {pq/sy>^. 

In probability theory, the standard deviation is often called the 
standard error. It is important to observe that for a fixed value of p 
the standard error of a; about sp increases as s increases and is propor- 
tional to {sy^, whereas the standard error of x/s about p decreases 
as s increases, since it is proportional to {1/sy^, 

Exercises 

1. Expand the binomial N{\ + §)* for s = 2 and s = 8 . Find the theoretical 

frequencies in each case by taking N as the smallest number necessary to 
express the terms of each expansion as integers. 

2. Find the mean and standard deviation for each of the above distributions 

using the appropriate formulas in (4). 

3. Find 5 , o', c^s, 0:4 for each of the following binomials: 
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4. For a certain binomial distribution 

(T 2.66, as = 0.318. Find p, g, and s. 

5. Assume that .04 is the theoretical rate of mortality in a certain age group. 

Suppose an insurance company is carrying s = 1000 such cases. What is 
the expected dispersion (standard error) in death rates from the theoretical 
rate p == .04? What would it be if s = 10,000? 

6. The value of x for which C(s, is the largest is called the mode of a 

Bernoulli distribution. Show that the mode is the positive integral value 
(or values) of x for which 

sp — q ^ X ^ sp 

References : 

1. Mathematical Theory of Probabilities — Fisher, pp. 99-101. 

2. Mathematical Statistics — Rietz, p. 25. 

7. Suppose the law of distribution of the happening of an event in s successive 

trials is given by the terms of the expansion of 

s s 

{q + py = EC(s, 

a;=0 sc = 0 

(а) If 8 = 100 what values of p and q will make Po — Pi, = Pio? 

(б) Give approximate values of the P’s in (a) . 

8. A bag contains three one dollar bills and four five dollar bills. Three bills 

are drawn at random. For each one dollar bill withdrawn, three two 
dollar bills are returned to the bag, and for each five dollar bill that is 
drawn, a one and a two and a ten dollar bill are returned to the bag. A 
second drawing of two bills is made. Designate by x and respectively, 
the values of the first and second drawings, (a) Give in tabular form the 
probabilities for each of the possible simultaneous values of x and y. 
(b) Evaluate E(x) and E{y), 

Solution, (a) The required probabilities are given in the cells of the 
table on page 19. The marginal totals are denoted by g{xi) and 

n m 

h{yi). The fact that 2Zp(a;i) = 1 and 'ZlHyi) - 1 is a check on the 
In 1 

computations. (6) E{x) == J^Xig(xi) = 26,910/2730 = $9.86, E(y) = 

m 

llyMyd = 18,120/2730 - $6.64. 

9. A bag contains three one dollar bills and two two dollar bills. Two bills are 

drawn at random. For each one dollar bill drawn two two dollar bills are 
returned to the bag, while for each twoMollar bill drawn a one and a two 
dollar bill are returned to the bag. A second drawing of two bills is made. 
Designate by x and y^ respectively, the values of the first and second draw- 
ings. Give in tabular form the probabilities for each of the possible 
simultaneous values of x and y. Find E{x) and E{y). 

10. For the more advanced student: Read and report on the following article, 
Urn Schemata os a Basis for the Development of Correlation Theory — - Rietz, 
Annals of Mathematics, (2), vol. 21 (1920), p, 306. 


\ 
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14. Approximating the Binomial with the ISTonnal Curve. If we 
plot the terms of (g + p)* as ordinates against the values of x/'s/b 
as abscissas and draw the corresponding histogram we find that it 
approaches a smooth curve as s is taken larger and larger. Thus in 
Figure 2 (where the vertical sides of the rectangles are omitted since 
they contribute nothing to the interpretation) we see how the stair- 
case outline of the histogram approaches close to a continuous curve 
as 5 is taken larger. 
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The limiting values of az and ai for the binomial as s oo are 

those of the normal curve. 
Thus from 



az 


and 


y/spq 


a4 = 


_ 1 _ 

m 


8 


-+3 


we 



Fig. 2. Showing Approach op {q + p)* to 
Smooth Curve as s co 


see that 0:3 0 and 

a4 3 as s — > 00 . This 
suggests the possibility of. 
approximating the binomial 
with the normal curve. As 
a matter of fact, it ^ can be 
proved, under certain con- 
ditions of approximation, 
that (g + p)® approaches 
the normal curve as a limit 
as s — > CO . The proof* will 
not be given here but a 
word or two about it may 
be appropriate. In using 


the normal curve to approximate the binomial we are particularly 
interested in a range of three or four standard deviations from the 
mean. This fact suggests the reasonableness of assuming that the 
number of successes x' above or below sp be considered as the same 
order of magnitude as cr. This means that shall remain 

finite as 00. Now (spqY^^ is of order neither p nor q 

is extremely small. Hence the propriety of assuming (in the proof) 
that x'/(sy^^ shall remain finite. This is the reason for plotting the 
histograms (Figure 2) in terms of x/{sy^^. 

We may expect, therefore, that the fitted normal curve will give 
a fair approximation to the binomial except possibly at the extremi- 
ties of the range. When the terms of the binomial are arranged 
symmetrically with respect to the mean, that is, when p = g, the 
approximation is rather better than otherwise. 

* The following references are recommended: 

Mathematical Statistics — Rietz, pp. 32-35. 

Probability and Its Engineering Uses — Fry, pp. 207-213. 

Annals of Mathematical Statistics, vol. 1, p. 197. 
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Exercise 

Fit a normal curve to the binomial (J + Directions: This binomial may 
be written 

'Efix) where /(a:) = C(18, ^ • 

(See Problem 8, §5.) Next recall that the equation of the normal curve is 

a 

where == and t - — — ^ 

v27r 

If we set iV = 1, S == spy and <r = (spqy^^ we shall expect that y will give, ap- 
proximately, the values of f(x) for the various values of x. As in Chapter VI of 
Part I the following outline is suggested for organizing the computations. 



t 

4>{t) 

y 

m 







Construct the histogram and draw the curve. It is suggested that paper ruled 

20 to the inch ” be used. By comparing the last two columns and also judging 
from the figure, does the fit seem to be good, even though s is rather small and 
q = ip? 

The above exercise will help the student appreciate a theorem 
which will now be introduced. The sum of successive terms of the 
binomial equals the area of the corresponding rectangles in its histo- 
gram. We may obtain an approximation to this sum by finding the 
area under the fitted normal curve which these rectangles occupy. 
Graphically, the values a; = 0, 1, 2, • • • , 5 are the mid-points of the 
bases of these rectangles. Therefore, if we are summing the terms 
of the binomial in which x ranges from x = di to x = inclusive, 
the corresponding area under the curve will be from x = — § to 

X = da + i- We must convert these values into standard units in 
order to enter a table of areas of the normal curve. Hence we have 
the following theorem. 

Theorem IX.* The sum of those terms of the binomial {q + p)* in 
which the number of successes x ranges from di to d%j inclusive, is 
approximately 

Q = 

Jh 

* Sometimes called the De Moivre-Laplace Theorem. 
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where 


ti = 


di — i — sp 


ti = ■ 


+ 


sp 


<r = (spqy'K 


Example 2. In tossing six coins what is the probability of obtaining 2, 3, 4, 
or 5 heads? 

Solution, We have sf - 3, = f, di = 2, = 5. Hence, k = - 1 . 5 /(i )^^2 

= --1.225, t 2 = 2.5/(|)i/2 = 2.041. Therefore, 

/ 2.041 /•1.225 /•2.041 

- / + I - .38971 + .47932 = ,869. 

-1.225 c/0 Jo 


Although the use of Theorem IX assumes s large we obtain here with s small a 
good approximation to the exact value Q = | = .875. In this example it would 
have been a simple matter to evaluate and sum the terms of the binomial but 
when s is large and the range from di to da includes many terms this procedure 
may be very laborious. When this is the case the above theorem gives an ap- 
proximation which may be quite satisfactory. The approximation is good if 
di lies on one side of the mean and da on the other at approximately equal dis- 
tances. 



Bxam'pU 3. Suppose p — .2 is the probability of success in a single trial. 
Estimate the probability of obtaining less than five or more than fifteen successes 
in fifty trials. 

Solution, The required probability, indicated by the shaded area in Figure 3, is 
P =* 1 — Q where Q is the probability of obtaining more than 4 and less than 16 
successes. In using Theorem IX, we have 

sp = 10, or = 2.828, k = -1.944, tz = 1.944. 

Therefore, 

X 1.944 

= .0519. 

The exact probability is obtained by evaluating and adding the sixth to the 
sixteenth terms of (.8 -h and subtracting the result from unity. However, 
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instead of computing these terms separately, a systematic procedure may be set 
up by which each term is made to depend upon the preceding term. Thus we 
may write a binomial as follows : 


(9 + py = 9*(1 + ky = g>|l + 

where k — Then q may be computed by logarithms and its product with the 

terms in the brackets may be obtained on computing machines by a continuous 
process. Thus for the terms within the brackets, 

the second term is first term multiplied by sky 

s — 1 

the third term is second term multiplied by — ^ — k * 

s — 2 

the fourth term is third term multiplied by k » 


3 ! 


¥-{ 


the rth term is (r — l)a^ term multiplied by 


s — (r — 2) 
r — I 


k 


In this way we find Q = .9497, so the required probability is P = .0503. For 
most practical purposes the approximation by use of the Theorem IX would be 
satisfactory. 

Example 4. Find the probability that in throwing 100 coins one will obtain a 
number of heads which will differ from the expected number by less than five. 


Solution, 



(.95 4- .05)^° where p — .05 is itig. 4. First Seven Terms op (.95 + .05)^® 
the probability of success in a 

single trial, find the probability of as many as seven successes. 

Solution. This binomial is too skew for a good fit with the normal curve, so 
the first seven terms of the expansion are evaluated. (See Figure 4.) Their sum 
is .9994 and this is the probability for less than seven successes. Therefore the 
probability for seven or more successes is .0006. 
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15. Simple Sampling of Attributes. It is a matter of common 
experience that certain fluctuations between observation and expecta- 
tion under a given hypothesis may be explained on the basis of chance. 
For example, in throwing 100 coins an observed result of 45 heads 
and 55 tails does not warrant the conclusion that the coins are biased. 
In such cases a very natural question arises as to what sampling 
deviations may be allowed before we conclude that they indicate the 
operation of definite and assignable causes, i,e.^ that the results are 
inconsistent with the given hypothesis. The theory dealing with 
such fluctuations in relative frequencies is called sampling of attri- 
butes. 

Suppose we are given a sample of s individuals of which x have 
a certain character or attribute. The question then arises: Is this 
result consistent with the hypothesis that the sample is drawn from 
a population having the fraction p with the given character? Could 
it reasonably have arisen on the basis of chance or is it significant of 
other than chance factors? In answering this question our common- 
sense judgment is greatly aided by a probability scale for chance 
fluctuations under the given hypothesis. We therefore restate our 
question* more precisely as follows: 

Suppose the probability of an event is known from theoretical 
considerations to be equal to p. What is the probability that in s 
trials the number of successes will differ numerically from the ex- 
pected number x = sp hj as much as (or more than) an observed 
amount d? 

The required probability may be estimated by means of the fol- 
lowing corollaries to Theorem IX. 

CoROLLAEY 1. The probability that the number of successes x in s 
trials will differ from the expected number x = sp by more than \d\ is 
approximately given by — 1 -- Qs where 

Qs = 2 I 4>{t)dt and 5 == • 

Jo ^ 

Corollary 2. If the words more than in Corollary 1 be re- 

d — - 

placed by “ as much asf^ then 8 == - 

cr 

The proofs are obvious if we admit that the normal curve fits the 
histogram of the point binomial. 

* See Problems in Sampling — Camp, Journal American Statistical Associa- 
tion, p. 964 December, 1923. 
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In another slightly different form involving relative frequencies, 
Qs gives approximations to the probability that the difference be- 
tween an observed relative frequency of success x/s and the true 
probability p satisfies the relation 



for every assigned positive value of b. 



In using Corollary 1, Table 2 gives a general idea of the magnitudes 
of probabilities for certain deviations. It is divided into two sections: 
the first section lists probabilities for specially selected deviations, 
the second section lists deviations for specially selected prob- 
abilities. 

A computed probability is used to scale our judgment as to whether 
the deviation in question can be explained on the basis of chance. 

Table 2. Abridged Normal Probability Scale 


Deviation 

8 

Chance of 
Deviation 

Outside zt 8 

! 

Deviation 

8 

Chance of 
Deviation 

Outside ±: 8 

0.5 

.617 

.67 

.50 

1.0 

.317 

1.28 

.20 

1.5 

.134 

1.64 

.10 

2.0 

.064 

1.96 

.05 

2.5 

.0124 

2-33 

.02 

3.0 

.0027 

2.58 

.01 

3.5 

.00047 

2.88 

.004 


If it cannot be so explained, it is said to be significant of other 
than chance causes. In passing judgment on a deviation it is some- 
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times difficult to give a definite answer. Good judgment in these 
matters only comes from much experience in the particular field. 
However, we shall not often be wrong if we draw the following 
conventionalized conclusions about Pg for a deviation outside ±5: 

If Ps > .05, d is not significant, 

If Ps < .01, 6 is significant, 


If .05 > Ps> .01, our conclusion about 5 is doubtful and we can- 
not say with much certainty whether the deviation is significant or 
not until we have more information. 

We see from Table 2 that this rule allows chance fluctuations to 
explain a deviation from the expected value of as much as 2,58 in 
standard units. In some situations it may be desirable to extend 
this range and place the bounds of chance fluctuations at 5 = ±3. 
There is then a correspondingly greater degree of certainty that 
deviations outside these limits are significant. 


Example 6. (Rietz) A group of scientific men reported 1705 sons and 1527 
daughters. The examination of these numbers brings up the following funda- 
mental questions of simple sampling. Do these data conform to the hypothesis 
that i is the probability that a child to be born will be a boy? That is, can the 
deviations be reasonably regarded as fluctuations in simple sampling under this 
hypothesis? In another form, what is the probability in throwing 3232 coins 
that the number of heads will differ from (3232/2) = 1616 by as much as 
d = 1705 - 1616 = 89? 


Solution, s = 3232, = 28.425, d = 

28.425 

= 1 - .9981 = .0019. 


= 3.113, Ps = 1 


J r*3.U3 
0 


Hence we conclude that these data cannot be explained on the basis of chance, 
i.e.y they are inconsistent with an hypothetical sex ratio of 


16. Probable Error. The word error is technically used in statis- 
tics to denote a deviation from the expected value. The deviation 8 
for which Ps = .5 is commonly called “ probable error."'^ This term 
is misleading because it is not the most probable error. Equally 
likely deviation would be a more appropriate name for it. 

From the normal probability scale we find that this deviation is 
8 = .6745 in standard units or .6745<r in arbitrary units. Hence for 
a normal distribution, probable error is equivalent to the quartile 
deviation which, in Part I, we have called E inx units and s in 
standard units. In other words, the probability is one-half that a 
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variate chosen at random will have a value within the range 
E{x) =b .6745o-a;. This definition of probable error combines the 
assumption of a normal distribution with the specification of an 
even wager. 

Used as a scale unit along the a:-axis, probable error is sometimes 
simply defined as a yardstick which is approximately This 
definition does not impose the condition that the distribution neces- 
sarily follow the normal curve. But there is no real gain in the re- 
moval of this condition if, for an interpretation of the signficance of 
such a deviation, we must refer to a normal probability scale. That 
is, in testing the significance of a discrepancy between an observed 
value and the expected value there is no merit in expressing that 
discrepancy in multiples of approximately %<t instead of a itself. It 
would seem that the language of probable error should be aban- 
doned. 

17. Standard Error and Correlation of Errors in Class Frequen- 
cies. When the probability distribution of a variable is known the 
expected frequency in any class interval may be determined. Sup- 
pose we have obtained from a random sample of an infinite distri- 
bution an observed frequency distribution. The variates, N in 
'number, should be distributed into n class intervals containing fxy 
•••/'» each. Instead of this suppose we find /i, A, • • • fn where 

n n 

Ef .• = EA 

1 1 

Let Table 3 represent the two distributions. 

Suppose next that a large number of such ^samples of N variates 
each are obtained under the same essential conditions. The ob- 


Table 3 


Class 

Class Mark 

Observed 

Frequency 

Theoreticdl 

Frequency 

1 

Xi 

fi 

fi 

2 


h 

A 

i 

•I 

Xi 

fi 

fi 

.1 

i 

n 

Xn 

fn 

fn 
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served and expected distributions will not agree in practice unless 
the samples are distributed exactly as the universe from which they 
are drawn. In the above table, the x^s are to be regarded simply as 
compartments and do not change. Only the frequencies change 
from sample to sample. Any class frequency fs will vary from 
sample to sample, and these values of fs will form a frequency 
distribution. 

It is important in certain problems to have an expression for the 
expected value of the variance <t// of this distribution in terms of 
observed values. To derive this expression we let Ps = fs/N be the 
probability that a variate will fall in the class s and = 1 — be 
the probability that it will fall elsewhere. Then, considering the N 
variates as observations or trials, the theoretical distribution of fre- 
quency in this class will be given by {Qs + Ps)^ and the square of the 
standard deviation of fs in the theoretical distribution is given by 


0 -// = ]Sfp,q„ 

If we accept the observed relative frequency fs/N as an approxima- 
tion to Ps then we have 



which reduces to 

( 10 ) 

as an approximate* value of the desired expression. 

We will next consider the correlation between deviations from the 
expected values of the frequencies in any two classes, say the sth 
and ^th. Let dfs be a deviation from the expected value or theo- 
retical mean of the sth class corresponding to a deviation dft from 
the expected value of the tth class. Since the total frequency is N, 
N — fsis the frequency which is distributed in classes other than the 
s class. If we obtain an excess 8fs in the s class then ~ dfs must be 
distributed among the other classes. If deviations from the ex- 
pected values are due only to random sampling fluctuations it is 

* When the sample is small, researches have shown that a better approxima- 
tion can be obtained by multiplying the right side of (10) by N/iN - 1). See 
BietZf Mathematical Statistics j pp. 120-122. 
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reasonable to assume that — dfs is distributed among the other classes 
in proportion to their expected frequencies. Therefore, as the con- 
tribution from the ft class, we have the proportion ft/(N — /«) and 
the number {-bff)ft/{N - ff). 

If the mean value of bft equals -5/* ft/{N — f,), for 5/^ assigned, 
then —ft/ {N — ff) must be the regression coefficient of bft on bf^. 
Therefore, 

ft 


„ <^fi 

• ^ — r - nf,hfs — = ^fth — 
N -fs (Thh 


so that 




ft 


N-f. 

ft 




A^(l ~ Vf) 
ftf 


Np.il - Vs) 


= - ftp. = - 


N 


Hence we have the result 


( 11 ) 


= - ' 


ftf 8 
N 


Clearly, rs/.s/, == and (xs/f = (x^fj^ == cr//, since the 5's 
measure deviations from their expected frequencies.* 

For an application of the above formula and the Bernoulli Theory 
in general see The Use of Statistical Techniques in Certain Problems 
of Market Research — Brown. Publication of the Graduate School 
of Business Administration, Harvard University, vol. XXII, no. 
3, 1935. 

18. The Poisson Exponential. If p (or g) is small the normal 
curve cannot ordinarily be used with confidence to approximate the 


* The correlation of errors here is properly a multivariate problem depending 
on the multinomial distribution. The argument given above indicates the 
plausibility of the result but it is not to be construed as a rigorous proof. By 
means of more advanced mathematics the correlation coefficient can be proved 
to have the result foxmd without making use of the assumption that any excess 
frequency in one class is distributed among the other classes in proportion to 
their frequencies. In other words, the assumption is superfluous. 
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terms of the binomial (q + p)®. If « is large but sp is in the neigh- 
borhood where x is small, a useful approximation to 


( 12 ) 


/(^) = 


a; ! (s — rr) 




-may be given by means of the Poisson exponential function. Sta- 
tistical examples of this situation are sometimes called rare events 
and occur in widely different fields; for example, the number born 
blind per year in a large city, the number of organisms of a given 
size S on a given glass slide that escape death by X-rays after being 
exposed for t seconds, the number of times in a certain year that the 
volume of trading on the New York Stock Exchange exceeds M 
million shares, the frequency of certain “ peaks ” in a given time 
interval such as occur in telephone traffic,'^ and other problems in 
demands for services. 

Suppose, then, that p is the probability for the occurrence of 
the rare event in question and assume that g = 1 — p is nearly 
unity. Let s be so large that s ! and (s — :r) ! may be replaced 
by their Stirling approximations [cf. (12) of Chapter II]. Making 
these replacements, (12) becomes 


(13) 


fix) = 


^8+l/20-SpXqS-'X 

x\ {s — 


Writing the second factor in the denominator of (13) in the form 
it is readily seen that (13) becomes 


fix) = 


( /y.\ s— a;— 1/2 

‘-0 


Now when x is small and- s is large,* 


and 



8—X+1/2 



(1 — p)®“® as (1 — p)« 


* The symbol =« is used to mean '‘approximately equal.” 
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For the required approximation we have, therefore, 


(14) 


fix) = 




where m == sp. This is Poisson^ s exponential function. It is tabu- 
lated for various values of m and x in Tables for Statisticians and 
Biometricians, The terms of the series 


(15) 








l + m+— + — + 
2 ! 3 ! 


+ 


m^\ 

F/ 


give the probability of exactly 0, 1, 2, • • or rr occurrences of the 
rare event in s trials. It is worthy of note that the Poisson expo- 
nential has only one parameter, m, whereas the normal curve has two 
parameters, the mean and a. 

Certain simple and interesting results may be obtained for the 
moments of the distribution given by (14) when x takes all integral 
values from x = 0 to a; = s. First we observe that when x = s 
in (15) we have 


Z 


a: ! 


= 1 


approximately if s is large. Then 


Eix) = j-i = xf{x) 


x~0 


== me 

=5 me 
= m 



l+m+— + 


+ 


= sp, approximately. 




And 


j'2 = ^X^f{x) 

0 

= - 1) + x]f(x) 

0 

= m(m + 1), approximately. 
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From these results, we have 
Mean - m — sp 

+ 1 ) ~ 


= m. 

(7 == 


It may also be shown that 

vz = m{m^ + 3m + 1) 

Vi == m(m^ + Gm^ + 7m + 1) 


whence we find that 


and 


jLts == m 
Pi = 3m^ 4“ m 


OJ3 


1 

^ 1/2 > 



It is a rather striking result that each of the mean, variance, and 
Pz is equal to m. 

The importance of the Poisson approximation in dealing with 
certain problems in telephone engineering and other fields is dis- 
cussed in Fry^s book, Probability and Its Engineering Uses. The 
interested student might investigate and prepare a special report on 
some of these applications. 


Problems 

1* Use Theorem IX to approximate the following sums: 

(a) the terms of (i + in which 50 ^ x 70. 

(b) the terms of (.946 + .054) in which x ^ 34. 

2. Pit a normal curve to the point binomial (| + |)^. 

3. Fit a normal curve to (i + J)^. 

4 . Suppose you are studying IQ’s and it is known that 20% in the universe with 

which you are dealing have an IQ below M, so that f is the probability 
that an individual chosen at random has an IQ below M. {M itself has 
no bearing on the* solution of the problem.) If a teacher had a class of 
fifty which could be regarded as a random sample from this universe, 
would it be exceptional if she found less than five or more than fifteen 
with IQ’s below if? (See Example 3.) 

6. Vital statistics gathered over a long period of time indicate that 5% of 
patients suffering from a certain disease die from that disease. Suppose 
that out of 30 cases examined in a certain city seven deaths were re- 
ported. Was this unusual? (See Example 5.) 
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6, {Camp) A dean’s report showed the following figures: 


Subject 

Honor Grades 

Failures 

Number 

Examined 

Number 

% 

Number 

% 

German 

187 

36 

33 

6.3 

521 

Mathematics 

162 

35 

38 

8.2 

466 

Music 

11 

50 

0 

0.0 

22 

All Subjects 


38 


5.4 



Taking p - .38 for honor grades and p = .054 for failures find the prob- 
ability : {a) that in selecting at random (from a supposedly infinite num- 
ber), one would obtain as few honor grades as were obtained in German; 
(b) as many failures; (c) in selecting 466 at random, one would obtain as 
few honor grades as were obtained in mathematics; {d) as many failures; 
(e) in selecting 22 at random, one would obtain no failures (as in music) ; 
(/) eleven or more honor grades. 

Hints, {a) Find sum of terms of (.62 -1- .38)®^^ in which x ^ 187. 

(b) See Problem 1 (6) above. 

(e) Evaluate (0.54)22 i^y logarithms. 

7. (Burgess) If analyzed past experience shows that 4% of all insured white 

males of exact age 65 have died within a year, and it is found that 60 of a 
similar group of 1000 actually die within a year, should the group be re- 
garded as essentially different from the general mass — that is, is the 
departure from the expected mortality greater than might be expected as 
a result of chance variation alone? 

8. (Richardson) In a coin tossing experiment in which a coin was tossed 400 

times, 250 heads appear. Do you believe the experiment was honestly 
performed? 

9. (Lovitt and Holtzclaw) Would you be willing to bet 10 to 1 that an opponent 

could not throw the sum 7 with two dice at least 23 times in a hundred 
throws with two dice? 

10. (Lovitt and Holtzclaw) The 1919 report of the Census Bureau in its bulletin 
on Mortality Statistics shows the average death rate from tuberculosis (aU 
forms) for the period 1906-1910 to be 163.5 per 100,000 of population 
and <T = 12.78. 

In the following instances is the variation from the average such as to 
justify one in constructing a theory as to the causes of this variation? 


California 

210.4 

Colorado 

244.2 

Michigan 

99.7 

N. Y. Bronx 

445.7 

Scranton, Pa. 

97.4. 


34 


Mathematics, of Statistics 


11. A sociologist who is interested in the characteristics of a certain race which 

we will call R, hit on the idea of trying to sort R’s from non-i^'s in the 
writings of unknown persons. Accordingly he persuaded a colleague to 
let him have 64 examination papers, with names removed, from psychology 
classes at Blank University. On 43 of these papers he correctly spotted 
the students as i2’s or non-i?’s. In 21 cases he missed. Find the prob- 
ability of this performance having resulted from pure chance. 

12. A coin is tossed 5 times. It is desired that the relative frequency of the 

appearance of heads shall not be greater than .51 or less than .49. Find 
the smallest value of s that will insure the above results with a degree of 
certainty Qb ^ .90. 

Solution. We must determine a such that Qa = .90 (at least) that 


We have 


X 1 

^,. 01 . 

s 2 


'(tT’-™ 

8 = .02 Vs 


since p = g = J. Also 
Q8 


-€ 


0(0 dt = .90 


whence from the tables, we find 8 = 1.645. Therefore, 
.02 Vs = 1.645 


and 


a = 6745. 


18. A coin is tossed s times. It is desired that the relative frequency of the 
appearance of heads shall not be greater than .502 or less than .498. Find 
the smallest value of a that will insme the foregoing results with a degree 
of certainty Qs ^ If. 

14. (Camp) A census report showed that in general 59.58% of New York City 
children went to school, but that only 56.8% of the negro children went 
to school. The number of negro children was 20,000. Was the difference 
due to chance? 

16. Bead and give a report on the reference given at the end of § 17. 

16. Find applications of the Poisson exponential function in the literature and 
report on them in class. 



CHAPTER II 

SOME USEFUL INTEGRALS AND FUNCTIONS 


To avoid interruption later on we will discuss here certain integrals 
and functions which will be useful in subsequent chapters. 

1. The Gamma Function. The improper integral 

(1) r(n) = f dx, n > 0, 

t/Q 

is called the Gamma function of the positive number n. The differ- 
ence equation 

(2) r(n + 1) = nT(n) 

is easily established from (1) by integration by parts (see the chapter 
on the Gamma function in any textbook on advanced calculus). 
By successive reduction of (2) we obtain 

r(n + 1) = n(n — 1) • • • (n — k)T(n — k) 


where fc is a positive integer less than n. 
integer and fc = n — 1 then we have 


If n is also a positive 


(3) 


r(n + 1) = n ! 


since from (1), r(l) = 1. Because of (3) the 
Gamma function is sometimes called the fac- 
torial function. It may be considered as a 
generalization of n ! when n is fractional. The 
graph of the function defined in (1) is shown 
in Figure 6. It can be drawn from the following 
values, some of which follow immediately from 
(2) and the others will be established later. 


r(0) = 00 
r(i) = 1 

r(i) = (Tr)^/^ 


r(2) = 1 . 
r(3) == 2. 
r(4) - 6. 
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Other forms of (1) may be obtained by changes of variable. For 
example, 

(4) r(w) = 2 f dy, hy x - y\ 

Jo 


From this form we can show that 

( 6 ) f dy = 

Jo 

To establish (5) we first observe from (4) that 

(6) r(|) = 2 /* e-«^ dy. 

Jo 


Since (6) is independent of the variable of integration, we may also 
write 

r(|) = 2 r e-»"da:. 

Jo 

So 

[r(|)P = 4 r T e-^Hy 


(7) 


= 4 r r dx dy, 

^0 t/Q 


the passage from the product of two integrals to the double integral 
being valid since neither the limits nor the integrand of either integral 
depend on the variable in the other. 

To evaluate (7) it will be convenient to change to polar coor- 
dinates. First, however, we will make a few remarks about a change 
of variables in general. Let x and y be the coordinates of a point 
with respect to a set of rectangular axes in a plane, u and the 
coordinates of another point with respect to a similarly chosen set 
of rectangular axes in some other plane. Suppose we have a function 
of the variables (x, y), 


z = Six, y), 

and we make x and y depend on new variables u and v by the rela- 
tions ■ 

X = g{u, v) and y = hiu, v). 
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These relations establish a certain correspondence between the 
points of the two planes. Let dA be an element of area for the 
function fix, y). Then it is shown in advanced calculus * that 


dA= j(^) dudv 
\u, vf 


where is a 

convenient symbol for the absolute value of 

the determinant 

dx 

dx 



du 

dv 

dx dy dx dy 



dy 

du dv dv du 


du 

dv 



and the latter is called the Jacobian or functional determinant of the 
transformation. 

If, then, we change (7) to polar coordinates by letting 

{ X — r cos 6 
\ y — r sm d 

the Jacobian is 

cos 6 —r sin 6 _ 

sin d r cos 6 


Therefore, the element of integration dx dy becomes r dr dd. The 
limits of integration are now from 0 to oo for r and from 0 to 7r/2 
for 6. From (8), x^ + y^ = So (7) becomes f 


im? = 4 



g-r2 y. flQ 



* See Mathematical Analysis j Goursat-Hedrick, vol. 1. 
t The transformation to polar coordinates and subsequent in- 
tegration involves ^ remainder term T which is the integral 
over an area between a quadrant of radius R and a square of 
side i2. But it can be shown that T — > 0 as R—>cc, (Cf . 
Wilson^s Advanced Calculus^ p. 364.) 



38 


Mathematics of Statistics 


Hence, 

(9) r(|) = (iry'S 

and from (9) and (6) we obtain (5). 

For a more general form of (5) we may let y = k > 0, 

and obtain 


( 10 ) 

and 

(10a) 


f dt = U^'«ky'\ 
do 

J at = (2Ttky'\ 


An alternate derivation of (9) may be given as follows. The right- 

hand member of (7) repre- 
sents the volume V under 
the bell-shaped surface 

(11) 2; == 

and so from (7) we have 
r(i) = 71 / 2 . Since (11) is 
a surface of revolution we 
may take as the element of 
volume a cylindrical shell 
of radius r, thickness dr, 
and height z. Then 



dV = 2irr dr z = 2Tre~^ dr, 

V = 2t f dr == tt, 

J Q 

and consequently we obtain (9). 

2. Stirling’s Approximation. An asymptotic expression, that is, 
an approximation with small percentage error, may be obtained 
for n ! when n is large. The following formula 
(12) n ! = ( 2 Tr)i/ 2 n«+i/ 2 e-« 

is called Stirling's approximation, A closer approximation is 

n I = 1 + ^ + ■ • • 

\ 12n 

However, the first term usually gives sufficiently close approxima- 
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tions if n is fairly large. A derivation of (12) may be found in 
several places. Among these are 

Probability and Its Engineering Uses — Fry, D. Van Nostrand Company; and 
Introduction to Mathematical Probability — Uspensky, McGraw-Hill Company. 
Seven-place tables of log ! up to w — 1000 are given in Glover’s Tables. 

3. The Beta Function. The definite integral 
(13) B(m, n) = f dx 


is called the Beta function of any two positive nuixibers m and n. 
Another useful form is 


(14) 


B(m, 


s; 

0 


sin2»"“i d cos2”“i B dd 


which is obtained by letting x = sin^ d in (13). 
If we let = 1 — y, (13) becomes 


B(m, n) = r (1 - dy 

= / (1 — dx 

^0 


= B(n, m). 


Therefore, m and n may be interchanged. 

A relation between the Beta and Gamma functions may be ob- 
tained as follows. From (4) we may write 


r(n)r(m) = 4 I dx I dy 

^0 Vo 

= 4 r f dy. 

Jo Jo 


Since the region of integration is the first quadrant of the xy-plsne 
we have, upon changing to polar coordinates, 


r(n)r(m) = 4 



J.2(jn+n-l)g-r2 Q ^Qg2n-1 Qj. cl0(j^f. 


j r»x/2 pCO 

gi^27n-l 0 0Qg2n-l 0(10 1 y*2(m+n)-lg-rr2 

0 Jq 

= B(m, n)T(m + n), 
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by (14) and (4). Hence 
(16) B(m, n) 


r(m)r(n) , 
r(m + n) 


4. Reduction to Gamma and Beta Functions. By appropriate 
changes of variables many of the integrals that occur in statistics 
may be evaluated by expressing them in terms of Gamma and Beta 
functions. 


Examples 


(a) Prove that 


XV.— .,-Kf)"(D 

Solution, This integral may be written 

2 J* W d(y^). 


By the substitution 


this becomes 


d(y^)^—dz 


1 /2(r2V/2 

dw) X* 

l('?LT’r(D. 


2\N/ 

(b) Determine k so that 

f co 

e-W^/2<r2 (s2) (W-3)/2 (J(j2) = 1, 


Solvtion. By the substitution 
Ns^ 


* 2(r2’ ~ N 


this becomes 


'2<r2\ (iV~i)/2 
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X oo 

(1 + g2)“iV/2 ^2 
■ to 

X rl2 

COS*” d de where 

m = N — 2, From Exercise 9 below we find that 




whence 


X 


■■'I 


6. Incomplete Beta and Gamma Functions. The integral 
(16) Tx(n + 1) = r e~'^x^ dx 

is called the incomplete Gamma function. Similarly 


(17) 


B; 


im, n) = f 


x^-\l ~ dx 


is called the incomplete Beta function. Both (16) and (17) are 
useful functions in mathematical statistics and they have been 
tabulated by Karl Pearson and his staff at the Biometric Laboratory, 
University College, London. They are published by the Cambridge 
University Press. 

Exercises 

1. Show that the Gamma function becomes infinite when n = 0. Hint. From 
(2) you can obtain 

V(n + © = (n + ^ — 1) • • • (w 4* l)nr(7i), 

that is 

r(n 4- h) 


Tin) = 


2. Show that 


X 


n{n + 1) 
it = 1 where <t>(t) = 


(» + i - 1) 
1 


3. Prove that r(i) = 
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4, Evaluate | + a;/2)^ dx by transforming it into a Gamma function. 

J ~.2 

Hint. Let cy — I a;/2 and determine c so that 
Ans. {e^7\)/Zm. 

X co 

g~ 2 a;(^ — 6 )^ da;. Aws. 6-122-87 f. 

J ^oo ^<0 

given that I dx = 

0 t/O 

r(|) - i-r(-i). 

7 . Find the difference and the ratio between the exact value of 10 ! and the 
approximate value obtained by using Stirling’s formula. 

8. Using (15) show that 


Kf) 1 

BIK" - >)• «■ 


X x/2 

cos’’* e dd = i B[(m + l)/2, |]. Hint. Use (14). 
10. Given that /(n) = ni^B(w/2, |), show that lim/(?i) = (2ir)ff2. 
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GENERAL CONCEPT OF DISTRIBUTION FUNCTION OF A CONTINUOUS 
VARIABLE. GEITERALIZED FREQUENCY CURVES 

1. Fundamental Notions and Definitions. The notion of distribu- 
tion functions relates to theoretical universes. The concept is an 
idealization of observed distributions comparable to the idealization 
of the outlines of material objects into the straight lines and circles 
of geometry. 

A continuous variable x is said to have the distribution function 
f(x), which we take to be single-valued and non-negative, if the 
frequency of occurrence of a: in the range a <x <bis measured by 

(1) f f(x) dx. 

"a 

If X has the distribution function f{x) with total frequency N, then 

(2) f f{x)dx = N, 

and y — f{x) is called a theoretical frequency curve or, more briefly, 
a frequency curve. If the actual occurrence of the variable is limited 
to a finite range, f(x) is defined to be identically zero outside that 
range. If the total area under the curve is taken as unity, so that 

( 3 ) f mdx = i, 

then y = /(x) is variously called the probability density ^ the proba- 
bility distribution^ or the probability function of x. Then, f{x) dx 
gives, to within infinitesimals of order higher than that of dx, the 
probability that x lies in the interval {x, x + dx). Under condition 

(3), the integral (1) denotes the probability that x lies in the interval 
(a, b). Under condition (2), (1) denotes the frequency of values in 
the interval (a, h). A distribution function can be regarded, there- 
fore, either as a frequency curve- or as a probability curve according 
as condition (2) or (3) is imposed. The distinction can be adjusted 
by determining appropriately a constant factor in y. — fix). 

■ ^ 43 ' 
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2. Moments. If x is distributed in accord with the frequency 
curve y = f{x)j with total frequency Nj the moment of order k about 
the 2/-axis is defined by 

(4) = x’‘f(x) dx. 


In particular, for fc = 1 we have the mean, vi = x, 



dx. 


If the mean is taken as the origin of measurement, so that 

J {x- x)f{x) dx = 0, 

then the moment of order 7b about the mean is defined by 
(6) ~ 

In particular, when fc = 2 we have the variance, 1 x 2 = cr\ 

_ The ju’s can be expressed in terms of the j^^s by the relation 
(6) m - Vk — C{kj l)vk^iPi + C(fc, 2)vk^2Vi^ — . . . 

+ i-iyC{k, r)vk^rvf + • • • + i-iy-^[C(k, fc -- 1) - 


where 


C(fc, r) = 


fc ! 


(k — t) It I 

In particular, the following relations are useful in computations: 


(7) 


m = Vz — dV2Vl + 21 ^ 1 ^ 

/X 4 = — 4:VzVi + 6v2Vi^ — 


The first of (7) is proved below and the others may be established 
in a similar way. 


1 2x 

■ if ^ + sX.*) ^ 

1 r" 

= — J x^f{x) dx — x^ = V2 — vi^. 
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In standard units the moment of order k is defined by 


( 8 ) 

where 

( 9 ) 

From (8) we have 


a* “ ~ 
cr* 

== J t^h{t) di, 

<7 

Kt) = + *) = ^/(a:). 


«o = 1, 

ai = 0, 

a2 = 1. 


Analogous definitions of moments could be given for probability 
fimctions. When W = 1, in accordance with (3), the integrals in 
(4) and (5) are also called expected values. The language of expected 
values will be used in another chapter where we will be dealing more 
with probability functions. Before proceeding with the discussion 
of frequency curves, however, we will give an example of a proba- 
bility curve. 



Example. The Cauchy curve is a classical example of a probability distribu- 
tion although its use in present day statistics is relatively unimportant. Its 
equation is 


( 10 ) 


h 

Tr{h^ “f" ^ 


— 00 s X 00 , 


The curve is symmetrical having its center at a; == 0. 


6 > 0 . 
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A simple derivation of- this function is as/follows. For a given real constant 
b locate the point (0, b) as in the figure below. Let lines be drawn at random 
through (0, b) and let 0 be the variable angle between any such line and the 



contained between 0 and 0 + d0 is d0/Tr. 
between d0 and dx to be 


negative direction of the ?/-axis; 
0 varies between the limits ~-7r/2 
and ir/2. The hypothesis is that 
all values of 0 in this range are 
equally likely. Denote the inter- 
cepts on the horizontal axis by x. 
Clearly, — oo < a; < oo . The re- 
lation between 0 and x is 

X 

— 0 — tan"! 7 • 

o 

Under the hypothesis, the prob- 
ability that an angle Ohx will be 
By differentiation we find the relation 


( 11 ) 


d0 _ h dx 

TV 


Therefore, the points of intersection of the lines with the x-axis are distributed 
so that the probability that a value of x will fall in the range dx is given by the 
right-hand member of (11) . Hence the probability function for the variable x is 

■3r(o2 x^) 


and the probability that x lies in a finite interval (c, d) is given by 

hdx 

since the integral of the right-hand member of (11) from — oo to « is equal to 
unity as can easily be verified. However, we cannot speak here of the mean 
value of X or of moments of higher order, since the integral 

X “ x^ dx 

has no meaning for ^ 0. This restriction does not apply to probability fimc- 
tions in general. 


3. The Pearson System. There are two systems of generalized 
frequency curves in common use: Pearson system and the 

Gram^CharUer System. 

During the years 1895-1916 Karl Pearson published papers in 
which he showed that a set of frequency curves could be obtained 
by a.ssigning values to the parameters in a certain first order differ- 
ential equation. The Pearson school claims that all the different 
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types of frequency distributions that arise in practical statistics 
can be represented by the solutions of this equation. 

With regard to the genesis of the Pearson system, one point of 
view is to regard it as empirical. Thus, starting with the differential 
equation 

( 12 ) , 

dt a + bt + ct^ 

it is observed that the solutions of (12) must satisfy certain geomet- 
rical properties of unimodal frequency distributions, namely, (a) the 
curve should vanish at the ends of the range, i.e,, as y — » 0, dy/dt 0; 
(5) when t ^ corresponding to a mode, dy/dt = 0. 

Among the solutions* of (12) there are several types of curves, 
the shapes depending on the parameters a, h, c, and m. Examples 
of symmetrical, skewed, U-shaped and J-shaped curves with finite 
and infinite range in either or both directions, are shown in Figure 9. 






Fig. 9.- Typical Curves op the Pearson System 


The parameters in (12) can be expressed in terms of the moments 
of the system. Multiplying (12) by dt and integrating over all 
admissible values of we have 

^ (a«‘ + H- dt = J y{mi^ - di. 

* What we actually do is to derive equations under the stated assumption that 
y -^0 as dyidt 0, and then generalize the results so as to admit as distribution 
functions solutions which do not vanish at the end (s) of the range. 


( 13 ) / 
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Integrating the left-hand side by parts, we obtain 

(14) i*(a + ht + ctM - f ylaW^^ + + l)i* -f- cQc + 2)i*+»] dt 

y{mi^ — dt. 

If vanishes at the ends of the range, then the first expression 
in (14) vanishes. If, in (12), y = h{t) we have from (8) and (14), 

(15) mah + oJkak^i + h(Jc + 1)q:;i! + c(Jc + 2)q';h-i == o:h-i- 

Assigning h successively the values fc == 0, 1, 2, 3, we obtain from 
(15) the four equations 

m + b =0 

a + 3c = 1 

m +[^36 + 4 ca 3 — 013 

maz 4* 3 a + 460:3 + 5co:4 = 0:4 

from which the parameters can be determined. Solving (16) we 
obtain 

m = “ [0:3 (3 + 0:4)] 
a == — [3af3^ — 40:4] 

^ ^["“^3(3 + 0:4)] 

c = ~ [6 + Zaz^ — 2ai] 

D = 18 + 120:32 - 100:4. 

Carver* has expressed (17) in the more convenient form 

0:3 , 0:3 

2(1 + 25) ' ” 2(1 + 25) ’ 

2 + 5 ^5 

2(1 + 28) ’ 2(1 -f 25) ’ 

^ 2q:4 “* 30:32 — 6 

a4 + 3 
% 

* Bee the Handbook of Mathematical Statistics — Hiet z et al. 
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Substitution of the above values into (15) yields an important 
recursion formula for the moments of the Pearson system: 

k 

^ 19 ) (Xk+i = 2 _ ^ + oisajf]. 

For our purposes the most important curves in the Pearson system 
are the Type VII (normal curve) and Type III. These will now 
be discussed in some detail. 

Type VIL If as = 0 = 5, then (12) becomes 

y 

which upon integration yields the so-called normal curve 
(20) y = — oo <t < oo . 


The constant C may be determined so that the area under the curve 
is N, Imposing this condition and making use of (10a) of Chapter II 
we find that C = N/(2Ty^^, and so (20) becomes 


y = 


N 

(27r)i/2 




It is conventional to write this in the form 


( 21 ) 

where 


y = — 4»(0 

<r 

iff") = — 

(2Tr)i'2 


We may call ti>{t) the normalized normal curve. 

Type III. If 5 = 0 but as 0 we see from (18) that (12) 
becomes 

iy -(?+‘V 

1+fl 

which upon integration yields the Type III curve 
( 22 ) y ^ K{A 
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where A = 2/a3, the range being (—A, oo). The criterion for a 
Type III curve is that 5 = 0. That is, if a Type III curve is to 
represent an observed distribution the observed moments should 
satisfy, at least approximately, the relation 

2a4 - Sas^ ~ 6 = 0 . 


Definitions of moments of an observed distribution are given in 
Part I. 

The constant K in (22) may be determined by the condition 
(23) J" y dt - N. 


This integral can be evaluated by means of the Gamma func- 
tion. Let A^ = n/2 and let A(A + ^) == xV2. Then we have 
+ md df = d(x^)/2A. 

Making these substitutions in (23) we obtain 




and therefore 


K = 





So with as the independent variable, (22) becomes 


(22a) 



When W = 1, (22a) defines the probability distribution of x^- This 

is an important function which 
we shall use in subsequent dis- 
cussions. 

The designation Type III 
is usually restricted to the case 
for which A^ 1. When A^ > 1, 
that is, when \az\ < 2, the curve is 
bell-shaped as shown in Figure 10. 
Fig. 10. TYPEinCxTEVEWHENla3|<2 ^^e Pearsou system, the 

distance between the mean and 
mode is m = — q! 3/2(1 -f- 25), and is a measure of skewness. 
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Under the conditions imposed for Type VII, m == 0. For Type III, 
however, m = —az/2 and therefore we have 

Imean — model == -^* 

Because of this relation, 1 0:3 1/2 is sometimes used as a measure of 
skewness in observed distributions. The curve for as = — fc (fc = 
a constant) is a reflection of that for as = k through the line t = 0. 

When < Ij that is, when jasl > 2, the curve is J-shaped with 
an infinite ordinate at i = —A, 

The special case for which A^ = 1 is known in the Pearson system 
as Type X. When as = ±2, (22) becomes 

y = Ke^K 

This is also known as Laplace’s second frequency curve. 

Tables of ordinates and areas of the Type III curve have been pub- 
lished by Salvosa in the Annals of Mathematical Statistics, vol. 1, no. 2. 

A systematic treatment of all the curves in the Pearson system 
has recently been given in a paper entitled A New Exposition and 
Chart for the Pearson System of Frequency Curves by C. C. Craig, 
Annals of Mathematical Statistics, vol. 7, no. 1, pp. 16-28. 

4. Genesis of the Pearson Curves in the Theory of Probability. 
The differential equation (12) is supposed to have some support 
in the theory of probability. This claim rests on the assumption 
that the distribution of statistical material may be likened to a priori 
distributions in certain urn schemata. The method by which (12) 
is associated with underlying probabilities is started by considering 
the following problem. 

An um contains n balls of which np are white, so that the probar 
bility of drawing a white ball in a single trial is p. The rest of the 
balls, nq, are black, and the probability of failure to draw a white 
ball in a single trial is g = 1 — p. If s balls are drawn from the 
urn one at a time with replacements after each draw, what is the 
probability, B(x), of drawing exactly x white balls and (s — x) 
black balls? 

From the Bernoulli theory it is known that the probabilities of 
getting = 0, 1, 2, • • •, s, successes in s trials are given by the suc- 
cessive terms of the binomial 

(? + p)' = 

*=0 



(24) 
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where 


B{x) = (7(s, 


Eepresenting the terms B{x) by ordinates yx, one may plot the 
(s + 1) points {x, yx)- Through these (s + 1) points one may 
imagine a curve that can be represented by an analytic function. 
Since 

yx = C{s, 

and hence 

2 /xfi = C(s, a: + 

we have 


(25) 

From (25) we obtain 


Vx+i ^ sp - px 

Vx qx-\- q 


( 26 ) 


yx\-i - Vx ^ sp - q - X ^ 
Vx+i + yx sp + q + {q- p)x 


Now the mean of any two ordinates {px and y,^^ may be considered 
as approximately equal to the ordinate {px+ih) midway between 
them. The slope of the line joining any two points (x, yx) and 
(x + 1 , Vx+i) is also approximately equal to the slope of the tangent 
at the point midway between these two points on the continuous 
curve. Under these two assumptions, (26) may be written as 

(27) D:cyx+if 2 ^ 2(sp - q- x) ^ 

2 /arf-i /2 sp + g + (g - p)x 

The right member of this equation is, therefore, the derivative 
of log y at the point {x + t/aH- 1 / 2 ). At any point (x, yx) this deriva- 

tive is 

fog') y) ^ 2 {sp - g - (x - 1 )} ^ 

dx sp + q + {q — p){x — i) 

If p = g = 5 , then (28) becomes 
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which is of the form 

y 


(x — m) dx 


a> 0, 


and which, upon integration, 3delds the normal curve 
(29) y = ke-^ 

(x — my 


where 


P = 


2a 


The next step consists in dealing with the case p 9 ^ q. From (28) 
we have 


If we set 


dx “ 1 (x - sp){q - p) 

m + i + —^ 


= g ~ P 

(spqy^ 


and 


X — sp 

(spqy^^ 


the above equation becomes 

(30) -j^flogp) 



'+f‘' 


1 

4spq 


If spq is so large that l/4spq is negligibly small, (30) becomes 


as 


(31) 


d(log y) 
dt 


+ t 


i+f, 


which upon integration yields the Type III curve. It is evident 
from (31) that this curve approaches the normal curve as a limit as 
as 0. 

With p = g, (28) is of the form (12) when 6 = c == 0. With 
p 7 ^ q, (28) is of the form (12) when c = 0. To produce, in the 
theory of probability, an expression comparable to (12) when both 
h and c are different from zero it is necessary to consider a more 
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general urn problem. So far the underlying probability, p, has 
been constant. If we consider the urn schemata previously described, 
but remove the restriction of replacements, then the chance of success 
is not constant from trial to trial but depends upon the results of 
previous trials. Thus, without replacements, the chances of obtain- 
ing exactly x = 0, 1, 2, • • • , s white balls in a draw of s balls, are 
given by the successive terms of the hypergeometric series 

(32) — {C(np, 0)C(ng, s) + C(np, l)C(ng, s - 1) + 

• • • + C(np, x)C(nq, s — x) + • • • + C(npj s)C(ng, 0) } 


in which the general term is 


H{x) = 


(np) ! (nq) I s \ (n — s) 


{up x) ! (nq — s + \ n \ x \ (s — x) ! 


By representing the terms of this series as ordinates of a frequency 
polygon, it is possible to show that * the slope, at the mid-point of 
any side, divided by the ordinate at that point is equal to a fraction 
whose numerator is a linear function of x and whose denominator 
is a quadratic function of x. It is clear that (12) gives a general 
statement of this property. 

Since the hypergeometric series is associated with (12) and the 
Bernoulli series is associated with a special case of (12), viz,j when 
c = 0, we should quite naturally expect that the Bernoulli series 
is a special case of the hypergeometric series. Writing H{x) in the 
form 


p{p - l/n} ■ ■ ■ [p 


{x + l)/n]q[q- l/n} 


X 


{q — (s- X — l)/n} 


{1 — l/n} ■ • • {1 — (x + l)/n\{l — x/n] • • • {1 — (s — l)/n} 


it is obvious that 


Lim H{x) == C{s, x)p^q^^ = B{x). 

n-s- w 


When n = 00 , there is an infinite supply in the urn, so the proba- 
bility, p, remains constant from trial to trial without replacements. 
In other words, sampling from a finite supply with replacements is 
the same as sampling from an infinite supply without replacements. 

* See 1. Elderton, Frequency Curves and Correlation. 

2. Kietz, MaiAemait’caZ (Cams Monograph), Chapter III. 
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6. Further Discussion of the Normal Curve. We will now return 
to a discussion of the normal curve, giving some proofs which had to 
be omitted in Part I, and supplying explanations which in one 
instance or another perhaps had to be read between the lines there. 

A. Fitting the Curve. If (29) is to represent an observed distri- 
bution, the parameters m, a, and k may be determined by the principle 
of moments. Equating the /cth functional moment to the fcth 
moment of observed data, for = 0, 1, 2, we have three simultaneous 
equations 


(33) 


‘X 




g-(x-m)2/2a X dx = Nx 


^2 dx = Nv2 


in which the parameters are the unknowns. 

The solution of these equations can be made to depend upon the 
integral 

^ CO 

(34) I dy = (27ra)^/2 


which is evaluated in Chapter II. Using this result and letting 
2/ = a; m, the first of equations (33) becomes 

(a) k{2Trayf^ = N. 


The second becomes 


k j* y dy -y km f dy = }fx. 


Te 

*J — 00 


In the above relation, the first integral vanishes because the inte- 
grand is an odd function. So, using (34), we have 

(6) km{27ray^^ — Nx. 


The third integral in (33) may be written in the form 


k J* dy + 2km 


/ OO 00 

e-iri2ay dy + I dy. 

^ oo t-/ — CO 


Upon integrating (by parts) the first integral in the above expression 
and evaluating the other integrals, we obtain 

(c) kV2Ta(m‘^ + o) = 
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From (a) and (b) we find m = x. From {a) and (c) we have 

+ a = j'2, 


and so 


a = fJL2 ^ 


Therefore, (29) becomes 

B. MomenU. The general moment of odd order of (35) about 
the mean is given by 

1 r" 

^ j dx. 

But the right member vanishes because the integrand is an odd 
function. Therefore, all moments of odd order of the norrnal curve 
taken about the mean are zero. 

The general moment of even order is 

1 r*® 


Integrating the right member by parts, letting u = (x — xy^~^j the 
following recursion relation is obtained for even moments 

(36) II 2 & = (2k “ l)o'2l^jfc-2. 

Then when = 1 , /i 2 = cr^; when = 2, yU 4 = 3 /U 2 ^; etc. 

A recursion formula for the moments in standard units may also 
be obtained from (19), Under the conditions imposed for Type VII, 
(19) becomes 

cxft+i = kat-i, fc = 1, 3, 5, • • • . 

Hence, 

0!2 = 1 
0:4 = 3 
ae = 1 • 3 • 5 

a 2 fc = 1 • 3 • 5 • • • (2fc - 1) 

^(2k)l 
2^fc ! * 
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C. Quadrature. Some writers use the term quadrature for the 
evaluation of an integral. The definite integral 

(37) == dx, h = 

vWo V2 


is commonly called the probability integral. Clearly it is a function 
of the variable limit t. Although (37) cannot be evaluated in finite 
form, it can be computed by expanding the integrand into a power 
series and integrating as many terms as may be needed. 

In (37) let y = hx. When x t, y = ht. So (37) becomes 

(38) $(«) = ^ / e-*" dy. 

Vtt'^O 


Expanding the integrand of (38) we have 


e-»® = 1 - 2/2 + 


^ -ifl 

2 ! 3 ! 


+ 




(n - 1) ! 


+ 


Termwise integration yields the result 

f 

if , y' 


1 /**' 1 
(39) — / e-^dy = —p 

'sJ'rrJo 


y 


W 


^ I 

O * H rk AC\ * 


y9 


10 42 216 


R\,R < 


1320 


This series converges for all values of y, and the error made in stopping 
at any term is numerically less than the first term neglected. For 
gma.11 values of y it converges rapidly and is a satisfactory method for 
computing when y ^ 1. 

But for large values of y, (39) converges too slowly to be practical; 
too many terms are required. It is therefore important to obtain 
an expansion in descending powers of y. To this end write 


(40) 

and 


C\-^ dy^f ^e-y^ dy - 
<Jo */0 vj/ 

- f e-y^ dy, 


\/x 

~~2 


f e-y^ dy = f -ye-y^ dy. 
j/ 'd y y 
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Integrating the last integral by parts we obtain 


- 2 / 2 t/y 


dy ■ 


Ay 2i/j y‘ 

Integrating successively by parts gives the result 

3-5 


(41) 


, { 


r 


i_+A 

2y^ 4,y^ 


— TTI + 


8y 


From (40) and (41) we have the final result 

1 r*' 2 j 

— / e-^dy = .b - 

Vtt^o 


(42) 


2yV IT 


^ 1 


2y^^4y^ 


3-5 


: + 


8j/« 

+ (•~l)”rn+l + 


where 




ti+i 


1-3-5 > - (2n - 1) 

2ny2n 


and (n + 1) is the number of the term. The series in (42) is 
called an asymptotic or semi-convergent series; it converges until a 
minimum term is reached and then diverges. The general term 
Tn+i decreases so long as n y^. But after the integrations by- 
parts have been performed so many times that n > y^, Tn+i increases. 
Of course the integrations should not be carried further. The value 
obtained by using the series in (42) will differ from the true value by 
less than the last term retained. 

Tables of (37) may be computed by means of (39) for y{= hi) 
and by (42) for y > 1, Such tables were computed long ago and 
are available in many places. 


/ 


Example. Evaluate (37) for i — 3 and check the result with the value of 
t 

dt ioT t = S given in the tables in the Appendix. 


Solution. Since y = i/V2 we are to evaluate (42) for y — Z/V2. Substitut- 
ing this value in (42) we have 

g~9/2 V2 I 

T 


.5 


1 

9 81 


15 105 

’ 729 6561 


= .5-g^(3) 


• (.9213) 

= .5 - .00136 = .49864. 

The value given in the tables is ,49865. 
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6. The Gram-Charlier Series, If a function f(x) gives only a 
rough approximation to a frequency distribution, a more accurate rep- 
resentation may be obtained by using the first few terms of the series 

(43) F{x) = CoKx) + Cif^^Kx) + C 2 /( 2 )(^) + . . . + + • • • 

where f{x)j called the generating fimction,” gif^es a first approxi- 
mation to the given distribution, and (x) is the nth derivative 
of f{x) with respect to x. 

It should be observed that series representation is also involved in 
the Pearson system. For, suppose the differential equation under- 
lying that system is written in the form 

^ ^ y(a - x) ^ 
dx f(x) 

Then if it be assumed that f(x) is expressible as a power series which 
is so rapidly convergent that the first few terms are sufiicient, we have 
the form given in (12). In the Pearson system the series occurs in 
the differential equation of the function whereas in the Gram-Charlier 
system it occurs in the function itself. 

If in (43) the normal curve is taken as the generating function 
then F{x) is known as the Gram-Charlier Type A series. In dis- 
cussing this series no essential loss of generality is suffered by using 
standard units. Thus we may write 

(44) F(t) = €o(l>(t) + + C2cf>^^Kt) + • • • + Cnct>^^Kt) -h • ‘ . 


where <}>(t) is defined in (21). The moments of F{t) are defined by 


(45) an = f F{t)t^ dt 

V <— CO 

and it follows that ao == 1, oji = 0, a 2 = 1. 

The coefficients Cn in (44) may be expressed in terms of the moments 
an, because the functions <^>^”^(0 an.d the Hermite polynomials Hm{t) 
defined by the relation 

(46) 

form a biorthogonal system. That is 


(47) 

(48) 


r dt ^ Q 

— 00 

dt = (-l)»n ! 


for m n, 
for m — n. 
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Proofs of (47) and (48) are available in the literature* and will be 
omitted here. The recursion relation 

(49) Fn+i(0 - - nHn-iit) 

can be established.! By differentiating ^(i) we find from (46) that 
Hi — t and since jEiFo = 1 we can use (49) for n > 1. 

To make use of the biorthogonal property noted above we multiply 
both members of (44) by Hn{t) and integrating, under the assumption * 
that the series is uniformly convergent, we obtain 

(50) r F{t)Hn{t)dt^Cnr 

t/ — 00 U — CO 

since all terms of the right member vanish except the one with the 
coefficient Cn. Hence from (50) we have 

(51) = F(t)Hn(t)dt 

n I J_ao 

From (51), (49), and (45) we obtain the following results: 



Cl = r F(t)t dt = 0 

«/ —CO 


C 2 = -i)dt = o 

° fi/” ^ ■ 

We have, therefore, 

(62) F{t) = <t>(0 - + • • • 

and F{x) = ^Fit). 

The values of <^(<), of its integral, and of its second to eighth deriva- 
tive, are given to five places of decimals in Glovefs Tables, 

* See Rietz, Mathematical Statistics, pp. 165-168. 
t See ILevy and Roth, Elements of Probability. Oxford. 1936. 
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Exercises 


1. Prove that the points of inflection of the normal curve are equidistant from 

the mode. What are the coordinates of these points? 

2. If X has the distribution function y — /(x), with total frequency 1, the mean 

deviation, M, about the value v is defined by 

X oo 

\x —V \f{x) dx. 

- CO 


Prove that M is a minimum when v is the median, that is, when the ordi- 
nate Sit X — V bisects the area under y — fix). 

Solution. We may write the expression for M in the form 



(v — x)fix) dx -I- I 

00 t/t) 


(x — v)f(x) dx. 


It is shown in treatises on advanced calculus that if 
B(e) 9) dx, 

6 being a parameter and a and b being functions of then 


dd 


-f 


df da dh 


Therefore, differentiating M with respect to v and equating the result to 
zero, we have 

fix) dx - f "fix) dx = 0. 

- CO n/X) 

/ V p CO 

fix) dx = I fix) dx, that is, when the 

partial areas to the right and left of v are equal. (It is left to the student 
to show that M is actually a minimum when dM/dv — 0.) 

3. Prove that the relation between the mean deviation (about the mean) and 
the standard deviation of the normal curve (in arbitrary units) is 

M = ( 2 / 7 r)i/ 2 <j- ~ .79§<r, approximately. 


Bint. By definition. 


M 


y ^ 


dx = 




d>it) \t\dt =^2(r I 4>it)t dt. 




4. Suppose X is distributed in accord with the frequency curve y = 

0 :< X 00 , a being a positive constant and C being determined by the 
condition that the area under the curve is N. Evaluate Pk successively 
for k ~ 1, 2, 3, 4. Then find fik for k — 2, Z, 4, and finally obtain the 
values 5 — a, or — a, ccg = 2, ai = 9. 

6. Given fix) ~ 0 a: ^ oo , where C is determined by the condition 

that the area under the curve is unity. Evaluate vk for ib = 1 to 4, uk for 
A; = 2 to 4, and a* for k — Z, 4:. Show that and a 4 satisfy the criterion 
2a4 — Zaz^ —* 6 — 0 . 
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6. State the differential equation iinderlying the Pearson system of frequency 

curves and derive the equation of the normal curve as a special solution 
of this equation. Evaluate the constant of integration so that the area 
under the curve is unity. 

7. Discuss the Type III curve. 

8. Show that y in (22) vanishes when t ~ — A and i = oo . 

9. Bead Chapter III of the Carits Monograph on Mathematical Statistics by 

H. L. Rietz. 

10. Explain how the probability integral (37) may be evaluated for, (a) small 

values of (6) large values of t. 

11. Evaluate (37) for (a) t = V2/2, (b) t = 2-v/2. 

12. Consult the reference cited for the proofs of (47) and (48) and give a report 

on them. 

13. By successive differentiation of evaluate Hmit) from (46) for m = 1, 2, 

3, 4. Check your results with (49) for n - 1, 2, 3. 

14. Making use of the biorthogonal property of Hermite polynomials and deriva- 

tives of the normal curve, derive the values of Cn,n = 0 to 4, in the Type A 
series. 

16. Taking i = 0, ±1, ±2, ±3, plot (52) on the same axes when (a) as ~ 0 and 
04 = 3, (6) Os = —1.2 and 04 = 3, (c) 03 = —1.2 and 04 = 4.2. In (h) if 
Os = 1.2, what effect would this have on the curve? 



CHAPTER IV 


JOINT DISTRIBUTIONS OF TWO VARIABLES. THE NORMAL 
CORRELATION SURFACE 

1. Fundamental Notions. Definitions of a frequency function of 
one variable and the associated notion of probability were given in 
Chapter III. Corresponding definitions will now be given for an 
arbitrary probability distribution of two variables. The continuous 
variables (a;, y) have the joint probability function J{x, y) if the 
double integral of /(a;, y) over a region of the (rr, ?/)-plane measures 
the relative frequency of occurrence of pairs of values (x, y) in that 
region. It will be understood that /(a;, y) is continuous, single- 
valued, and non-negative. If values of (x, y) are restricted to a 
finite region we define y) to be identically zero outside that re- 
gion. In the extended region of definition, we have 

(1) f f S{^,y) dy dx = l. 

Geometrically, this means that the volume under the surface rep- 
resented by 3 = f{x, y) is unity. Then/(a;, y) dy dx is the probability 
that simultaneously x lies in the interval {x, x + dx) and y lies in 
the interval {y, y + dy). Consequently, 

(2) r r fix, y) dy dx 

%f a VC 

represents the probability that x lies between a and 6 at the same 
time that y lies between c and d. 

We shall distinguish between two cases: (a) when the variables 
are independent in the probability sense, and (6) when they are 
correlated. Let the probability be g{x) dx that x occurs in dx for 
all y’s. Then integrating over all admissible values of t/, we have 

(3) gix) dx = dx I fix, y) dy. 

«/ _ 00 

It is clear that the integral in (3) gives g{x) because the relative 
frequency of occurrence of x in any interval (a, h) is the relative 

<53 
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frequency of pairs {x, y) belonging to the strip of the a:y-plane for 
which a < X <b, and this is 

I I y) dy dx = I g{x) dx. 

«/ a <1/ — CO V o 

Similarly, if h{y) dy is the probability that y occurs in dy for all 
assignments of x, we have 


( 4 ) 


/ CO 

fix, y) dx. 

- 00 


In accordance with convention we shall call g{x) and h{y) the marginal 
distributions. 

The independence of x and y is characterized by the following 

Definition. The variables x and y are independent when f(xj y) 
= 9{^)h(y)- If f(j^j y) cannot be expressed identically as the product 
of the marginal distributions, then x and y are said to be correlated. 

2. Moments. Let the general product moment about the com- 
mon origin of x and y be defined as follows: 

/ CO 00 

/ fi^, y)x’^y” dy dx. 

- CO *J — 00 

If m = 0 and n = 1, we have 

(6) voi = /:/: y)y dy dx. 

Let f{x, y) be a function in which the order of integration may be 
interchanged. Then becomes 

/ Co' P ^ 00 ”1 00 

/ fix, y) dxlydy = I hiy)y dy, 

~ CO \_nJ — 00 J i/ _ CO 

which is the mean, y, of the y^s. Similarly, the mean of the x^s is 

/ OD 00 y* 00 

/ fix, y)x dy dx = I gix)x dx. 

-OOt/ — 00 ' %/ — 00 


We will now define the general product moment about the means 
ix, y) as follows; 
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When m = n = 1, we have 


= f f (x - x)(y - y)f(x, y) dy ^ 

%J — CO t/ — 00 


which is styled the co-variance of the joint distribution. 
When m = 2 and n — 0, we have the variance of x, 


/ CO n CO 

/ (x - xYf{x, y) dy dx 

- 00 t/ — c» 

^ 00 

= 1 (a: — x^gix) dx 

*J — 00 


Similarly, when m = 0 and n = 2, we have the variance of y, 


= f f (y - yyf(x, y) dy 

O — oo t/ — 00 
/* “ 

= / {y - yYKy) dy 

t/ — CD 


It is left as an exercise for the student to show that 

^ [ At20 = *'20 — *'10^. 

The coefficient of correlation between x and y, denoted by pxy, 
is defined by 

|J^ii 

(13) 

3. Regression. If y has been assigned in the joint probability 
function f{x, y)j the probability that x will lie in an infinitesimal 
interval is 

Kx, y) , 


Thus, when y is fixed, 


fix, y) 


dx — 1, 


and so f{x, y)/h(y) is the probability function of x for a fixed y. 
It may be called the probability density representing a y array of x^s. 
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Likewise, if we fix a; the probability density for an x array of y’s 
is given by /(a;, y)/g{x), since 



Six, y) 
gix) 


dy 


1 


when X is fixed. 

The notion of arrays may be made more concrete by thinking 
of a joint distribution of the heights and weights of men. If x refers 
to weight and y to height, then an example of an x array of y’s is the 
distribution of the heights of all men who weigh 150 pounds, and the 
weights of all men who are six feet tall is an example of a 2 / array of a:’s. 

The mean of an a: array of y’s is 


where the integration is performed over all values in the array 
defined by x. Similarly, the mean of a 2 / array of is 


(16) 




xSix, y) 

Hy) 


dx 


integrated over all a:’s in an array for a fixed y. 

The variance in an x array of y’s is given by 

integrated over all values in the array fixed by x. Similarly, the 
y j variance in a y array of x’s is 

J(x, y) 


(17) 


j (X XyY 


Hy) 


dx 


integrated over all values in the array 
fixed by y. 

Taking different x arrays of y^B 
fixes the mean points yx and as x 
varies continuously we get the locus 
of these means which is called the 
regression curve of y on x. Its equa- 
tion is given by (14) where now, of 
course, x is a variable. Similarly, 
(15) gives the regression curve of x 
on y. Of particular interest and use are the cases in which these 



'////// 
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regression curves are straight lines. If the equation of the regres- 
sion curve oi y onx is of the form 

= Ax + B, 

then the regression of y on x is said to be linear. Similarly, if the 
equation of the regression curve of x on t/ is of the form 


Xy = Cy + D, 


then the regression of x on 7 / is said to be linear. If one regression 
system is linear the other is not necessarily linear. 

Let us now consider the implications of linear regression on the 
joint probability function /(x, y) and the marginal totals g{x) and 
h{y). Consider 

g{x) 

or 

(18) J’ yfix, y) dy = Axgix) + Bg(x). 

Integrating each side of (18) with respect to x, and remembering 
that we may interchange the order of integration, we have 


I I yfi^i y) dy dx == A I xg(x) dx + B f g{x) dx, 

t-/ — CO — ^ CO ^ ^ •— 00 ^ — 00 


or 


(19) 


Vox = Aj'io 4“ B> 


Multipl 3 ring each side of (18) by x and integrating with respect to x, 
‘we have 


I y) dy dx = A I x^ g(x) dx + B I xg{x) dx, 

. OJJ _t/ — 05 «. ^ — CO — 00 


Since the left member is 




^yf{^, y)dydx = Vii, 


we have 

( 20 ) 


Pll = 4 i'20 + BviO. 
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A simultaneous solution of (19) and (20) yields 


yii — yipypi 

j'20 — 


M20 o’x 


B = 


Poi ““ 3^10 


Q'y 

<T X 


P 


- - <^2/ 

= - X~~p. 

CTx 

Therefore the equation of the line of regression of y on x becomes 

( 21 ) 

In an analogous manner, if the regression of x on y is linear the 
regression line has the equation 


( 22 ) 


j = P:^(y _ gi). 


The quantities A 
regression coefficients. 


the 


/fe y) = ' 


= picfyfcrx) and C = p(<rx/(ry) are called 
It is obvious that their product is p^. 

Example. Given 

^ 0 X yt 

0 y ^ 

as the joint probability function of two 
variables x and y. Find (i) the margi- 
nal totals g(x) and h(y); (ii) the mean 
and variance of each of the marginal 
totals, i.e., j'lo and a-z^ — ijl 2 q for p(x), z^oi 
and cry2 = ^02 for h{y); (iii) the equa- 
tions of the regression curves of y on x 
and of X on y, yz and Xy; (iv) the 
correlation coefficient p. 

Solutions. The volume under the 
surface represented by the given function is unity. Thus 

'•2/2 2 

~ (fx # = — 1 ydy ^1. 

0 a® a^Jo 

The surface is shown above. 

(i) The marginal totals are 



2 


-£ 

fc/o a* o' 


Ky) 


= -(o-*) 


22/ 
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(ii) The means are 


Since 


the variances are 


m 


J r*a 2 

x— (a — x) dx = 

0 

^ r 22 /^ 2a 

= V-dy^j. 


r® .2 

0=1 x^~ 

Jo a- 

2 = / y^— 
Jo OP' 

1120 = cr»2 = 


^(a - x) dx 
“ 2 

?! ?! - 
6 ” ^9 ““ 18 

4a2 

H02-ay == 


(iii) The regression lines are 

i>.-r 

Ux 


2/a^ a X 

^2(a - x)/aP‘^ 2~ 


_ 2/a2 2, 


(iv) From the equations of the regression lines it follows that f and 
p ±= I since p{<ry/<Tx) is positive. 

4. The Standard Error of Estimate. We have seen that the 
probability density in an x array of t/^s is /(x, y)/g{x). Then the 
variance Sy.x^ within such an array is 




(» - 


The mean, over all x arrays, of values of Sy.x^ weighted with the 
marginal distribution of x is denoted by and Sy is called the 
standard error of estimate. We will now show that Sy^ — <Ty\l — p^). 
By definition, 
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Using the value of yx given in (21) the above expression becomes 


= J J \^y - y - p'^ {x - x)]^ fix, y) dy dx 

= J J I ( 2 / - - 2p^ ( 2 / - g)(x - x) + 


p 2 _ ( 2 ; _ xy^f(x, y) dydx, 

and the right member simplifies so that we have the result . 

S/ = o-/(l - p^). 


From this result it follows that 

-1 < p < 1. 

6. The Normal Correlation Surface. We shall now consider a 
joint probability function of special interest. The normal correla- 
tion surface is defined by the following function 

(23) fix, y) = Ke-^, 

where 

P f \ 2pxy y^ ] 

2(1 - p2) [ O’/ r 

^ = 2Trcr,(r^(l - p2)i/2, 

— oo<:c:^oo, ~oo<y:^oo, 


and the variables x and y have the origin of their reference system at their 
respective means J that is, 


(24) 



These , conditions (24) may be imposed without essential loss of 
generality and will simplify the algebraic discussion. 
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The marginal distribution of a: is given by 


(25) 


f(^, V) dy 

= Ke-=^i^x^J dy' 

1 


CTjc V^2t 


. q—x^I2<Xx^^ 


Similarly, the marginal distribution of y is 

Kv) 

(26) 


Hence we may state 

Theorem I. If two variables are normally correlated, each variable 
is normally distributed in its marginal totals. 

That the converse is not necessarily true is shown by the following 
illustration. Consider a clay model of a normal 
correlation surface such that its marginal totals 
are necessarily normal distributions by the above 
theorem. Quantities of the clay can be redis- 
tributed by piling up in certain spots the clay 
that is scooped out in other spots in such a way 
that the marginal totals are not disturbed. It 
is obvious that the resulting surface is not one that is defined 
by (23). 

Other interesting properties of normally correlated variables are 
described by the following theorems. 

Theorem II. The regression systems of a normal correlation surface 
are linear. 

The proof is a matter of integration. Let us find the probabil- 
ity function of an X array of y^s. By definition, this is given by 
f {xj y)/gix). To get the mean of such an array we must multiply 


ra czi 
m c±] 


=f fi^, y) dx 

= 1 

(ry\^2Tr 
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its probability distribution by y and integrate over all values of 
y in the array. Thus we have 


=/. 


yf(x, y) dy 

. gi^:) 


„ o-j,{2r(l — p^)} 


1/2 






Xp(Ty 
<r X 


In the exercises at the end of the chapter the student is asked to 
verify the above result. If x is allowed to vary over the arrays, it is 
evident that the locus of the means of the x arrays of y^s is the line 


( 27 ) 


XpXy 



In a similar way the mean of a ?/ array of x^s is given by 


i: 


xf{x, y) dx 

. Kv) 


yiTxp 

(T y 


and this lies on the regression line 


( 28 ) 


yp(^x 

<Ty 


While it is an intrinsic property of a normal correlation surface 
that both regressions are linear, one should not infer that this is 
characteristic of joint probability functions in general. One or both 
or neither of the regression systems of an arbitrary distribution 
function may be linear. The student will observe that the definition 
of the correlation coefficient did not involve the condition that 
/(x, y) was normal nor that regression was linear. Although the 
definition of a correlation coeflSicient does not require linear regression, 
nevertheless the correlation coefiicient may fail to measure the 
correlation in the case of appreciable non-linear regression. 

Theorem HI. If x and y are normally corr elated ^ then each array is 
a normal distribution with constant variance Sy^ from one array of y^s 
to another and constant variance Sx^ fTom one array of x^s to another. 
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The proof consists in exhibiting the distribution function for an 
X array of i/’s and for a y array of x’s. Thus, for the first case we have 

(29) 

g{x) \/27rS„ 

where - p®). Evidently, this is a normal distribution 

with variance Sy^ which is independent of x and therefore is constant 
over all x arrays. It is left as an exercise for the student to give the 
companion proof for the arrays in the y direction. 

When the variance is constant over the arrays in the x direction 
the regression system of y on a; is said to be homoscedastic (equally 
scattered). Similarly for the y direction. A geometrical represen- 
tation of a normal correlation surface is given in Part I, § 18 of 
Chapter VIII. 

6. Limiting Forms. Suppose a plane is passed through the surface 
defined by (23) parallel to the a:j/-plane. Analytically, this means 
that we let/(a;, y) = c where c is some constant less than the maxirmim 
value of the function, that is, we take 0 < c < if to insure a real 
intersection. We obtain 


(30) 

where 


2pxy ^ y^ 

<Ty^ 


(30a) 


= 2(1 - p^) log,- 
c 


which is obviously not negative. Thus the points (x, y) for which 
the probability density is constant lie on an ellipse. 

It is easier to study (30) if we transform the variables to standard 
units by letting = xja^ and ty = y/<r„. Then (30) becomes 

(31) Q - 2pUy + ty^ = X^. 

The cross-product term will vanish under the transformations 


tx = u COS 6 — 2) sin 0 
is = w sin 5 + COS 8 

when 6 == ir/4. So the required rotation formulas are 

U — V , . U + V 


tx=- 


and 


L 


( 2 ) 1/2 


(32) 


( 2 ) 1/2 
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Applying these to (31) we obtain 

(33) uKl - p)+ vKl + p) = X* 

which may be written in the standard form 


(34) 

where 


+ P=1 




= 


and 
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X2 

l + P* 


The eccentricity of the ellipse (34) is (1 — = [2p/ (1 + p)f^\ 

We see that 6 — » a as p ^ 0. When p = 0, 6 = a = X. Then (34) 
would be a circle, and (23) would be a surface of revolution if the 
variables were expressed in standard units. When p “ 1, it follows 
from (33) and (30a) that v = 0, From (32) it is seen that the line 
= 0 is the same as ty = tx, and the ellipse has degenerated into a 
straight line. The surface then shrinks into a normal curve in the 
plane ty = tx. 

7. Tetrachoric Correlation. The word tetrachoric refers to a 
2X2 fold table. Suppose N objects are classified according as 
they possess one or both or neither of two qualitative traits or attri- 
butes which may, for convenience, be denoted by I and II. Such 
a classification will yield a four fold table as shown in Table 4, 


Table 4 



Not II 

II 

Total 

Not I 

a 

b 

Oj b 

I 

0 

d 

c-^-d 

Total 

a + c 

b +d 

N 


where a + b + c + d — N, the four classes being mutually exclusive 
but not necessarily exhaustive. The attributes may sometimes 
admit also of quantitative measurement but we are considering only 
the case where they are classified in dichotomy, such as tall and 
“ not tall,^’ “ male ” and female,’’ '' alive ” and dead,” “ good ” 
and bad,” dull ” and not dull,” etc. An example is the follow- 
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ing classification of 26,287 children where attribute I is dullness and 
attribute II is developmental defects. 

Table 5. (K. Pearson, Tables^ p. li) 



Without 

Defects 

With 

Defects 

Totals 

Not Dull 

22,793 

1,140 

24,213 

Dull 

1,186 

888 

2,074 

Totals 

23,979 

2,308 

26,287 


The problem in such classifications is to measure the intensity 
of association between the two attributes in the set. Let us suppose 
that our data had been given initially so that a fine division into 
many cells was possible and that the result would have presented 
a normal correlation surface. If this surface were then divided into 
four cells by planes x = h and y - kto yield the relative frequencies 
observed, then the correlation coefficient that characterizes this 
normal correlation surface is called tetrachoric r. It will be denoted 
by Tt. 

K. Pearson has given a method and tables for determining r«. 
(Cf. Tables for Statisticians and Biometricians^ Part I.) The pro- 
cedure may be indicated by the following diagram and skeleton 
solution for our example, Table 5. (The details will be found in the 
reference cited.) 

Solution of Example, (See Figure 11, page 76.) 

2074 

.-55S 

r* = .?308 ^ Qgy goo = 1.354. 

26287 

Entering Pearson's Tables for the above values of h and k and interpolating, it is 
found that rt = .652. 

The determination of rt by Pearson's method is rather tedious when .2 ^ |r^| 

.8. This burden has been lifted by two fairly recent publications. Camp has 
given in his text (pp. 307-310) an ingenious and simple method for approximat- 
ing rt. His scheme is interesting from the mathematical as well as the practical 
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point of view. In Computing Diagrams for the Tetrachoric Correlation Coefficient 
by Thurstone el al. (available at the University of Chicago Bookstore), a useful 
approximation to rt can be determined by inspection. 



— 00 —00 
Fig. 11 

Exercises 

1. Show that the definition of p may be written in the form 
p = r j xyj{x, y) dy dx —xy, 

% Given that /(x, y) ^ 2/ a\ x y, y a. Show that both regres- 

sion systems are linear. Evaluate p. 

3. Derive (22). 

4. Prove that the area of the ellipse (30) is Tr\^'<Txcry/(l — p®)^/^. 

6. (a) If p = .6 show that the ratio between the major and minor axes of 
the ellipse is 2. 

(5) Show that the slope of the regression line of ?/ on x for a normal cor- 
relation surface is p/(l — in units of Sy and cr^. 
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6. Establish the truth or falsity of the following proposition: A necessary and 

sufficient condition that two variables be normally correlated is that their 
regression systems be linear. 

7. Prove that the regression systems of two normally correlated variables are 

linear and homoscedastic. 


8. For (23) prove the following: 

(а) the mean value of yx taken over all values of x is zero, 

(б) the variance of y* is equal to 

(c) the correlation coefficient between Vx and y is equal to p. 


Hints, (a) Evaluate 
(&) 

(c) 


/ / y) dy dx, 

- a>«y — CO 

^ CO ^00 

= J_ J y) dy dx, 

Evaluate f f ~ v) dy dx. 

— ca^ — 00 ^Vx y 


9. If X and y are discrete variables, p is defined by 


E{xy) - E(x)E(y) 

p ^ 

cr xO" y 

where 

<r, = [E{x^) - {E(x) 1 2]-'“, = [EW - {E(y ) ! 

and 

n 

E(x) = 'Ilxig(xi)y 
1 
m 

E(y) = JlyMyi), 

1 

n m 

Eixy) = T,T,xiyif{xi, Vi), 

1 1 

f(xij yi) being the probability for the simultaneous occurrence of the pair 

m n 

of values {xiy yj), g(xi) = Vi) and h(yj) == yj) being the 

j=l 

marginal distributions of x and y, respectively. Find p for the table in 
Exercise 8, § 13, Chapter I. 

10, Investigate the references given for tetrachoric r and give a report on the 
results of your study. 



CHAPTER V 

MULTIPLE AND PARTIAL CORRELATION 

1. Notation. Simple correlation theory deals with co-variation 
in two variables. If other factors are involved the two variables 
are assessed as the important ones for the investigation and the 
other factors are ignored. But situations frequently arise in the 
fields of agriculture, biology, economics, education, and psychology, 
which caU for consideration of three or more influences bearing 
simultaneously on a problem, and hence for the investigation of 
interrelations among three or more variables. For example, crop 
yield varies with soil fertility, rainfall, and temperature; wheat 
production is affected by acreage plante d and yi eld per a cre: stu- 
dents’ honor points are connected with intelligence, health, hours 
of study, etc.; their chest measurements vary with stature and 
weight. 

The term multiple correlcdion refers to a theory of correlation 
involving three or more variables. For ease in exposition we shall 
restrict the derivation of formulas to the three-variable case although 



the method is perfectly general. When the three-variable case is 
understood the formulas can be generalized for k variables. 

The framework of a two-way table was a rectangle in the xy-plane 

78 
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which was divided into cells by lines parallel to the axes. The 
analogue in the case of three variables, which we shall denote by 
X, y, and is a rectangular parallelopiped divided into cells by slicing 
planes parallel to the axes. 

We shall denote the frequency in the cell whose mid-point has the 
coordinates (x, y, z) by /(x, y, z). A pair of {x, y) values fixes a 
z column (Figure 12), and the sum of the frequencies in such a 
column is the “ column total 

(1) y, z) = fix, v), 

Z 

where here and subsequently the syinbol X) together with the 
variable underneath denotes a summation in the direction of that 
variable. Now consider all those columns which have the same 
2 /. Their total frequency, denoted by 

(2) I^fix, y) = Siy), 

X 

may appropriately be called a “ slab total ” (Figure 13). 



Finally, if we add all the slab totals we get the total frequency N. 
Thus 


( 3 ) 


Hfiy) = N. 
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By making use of (1) we may, if we wish, express (2) as the double sum 

■(^) SS/fe 2/, = /(2/), 

X Z 

and using (4) we may express (3) as the triple sum 

(5) y, 2 ) = Af. 

X y z 

(a) The aggregate of the column totals fix, y) forms a two-way 
frequency table. If we imagine the numerical values of these fre- 
quencies written in the cells of the a:y-plane it is easy to see that 
they constitute a correlation table (Figure 14). For this table, the 
simple correlation coefficient r„y is called the total correlation (in 
contradistinction to a partial correlation coefficient to be defined 
later) and the regression curves are called the total regressions of 
y on X and x on y. Discussions analogous to (o) will now be given 
for horizontal columns parallel (6) to Ox and (c) to Oy. 



(b) A pair of (y, z) values fixes an x column parallel to Ox. The 
sum of the frequencies in an x column is 

(6) S/(a;, y, z) = fiy, z). 

X 

If we add aU those columns which have the same z we get a slab 
perpendicular to z whose total is 

(7) Z/(2/. 2) = fiz). 

y 

Finally, the totals of all such slabs is 

(8) J^f(z) = N. 
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The numerical values of the totals f{y, z) written, if desired, in the 
cells of yz-plme form a two-way correlation table, as represented in 
Figure 15. For this table, Vyg is the total correlation coefficient 
between y and z^ and the regression 
curves are the total regressions of y on 
z and z on y, 

(c) Similarly, a pair of (x, z) values 
fixes a y colunrn parallel to Oy, The 
sum of the frequencies in such a col- 
umn is 

(9) y, 2 ) = /(a:, 2 ). 


If we add all the columns which have 
the same z we get a slab perpendicular 
to X whose total is 

( 10 ) 2 ) = fi^)’ 

Z 

The sum of all such slabs is 

(11) Z/(^) = 



The numerical values of the column totals /(a:, z) constitute a two- 
way correlation table whose correlation coefficient r^z is the total 



correlation between x and z. The total regressions of x on 0 and zonx 
are given by the regression curves of this table (Figure 16). 
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2 . Regression. The mean of a column at (x, y) is defined by 

(12) 2(x, y) = -At y, 2)- 

]{x, y) z 

Similarly, the mean of an x column at (2/, z) is 

( 13 ) x(y, z) = A— Xlxf(x, y, z), 
and the mean of a 2/ column at {x^ z) is 

( 14 ) y{x, z) = A— Y^yKx, y, z). 

j{Xj Z)y 

The regression plane of z on xy is that plane which fits the means 
of the z columns best in a least-squares sense. This should not be 
confused with the true regression surface, z on xy, which is defined 
as the locus of the mean points of the z columns. More accurately, 
it is the locus of these points as the dimensions of the cells approach 
zero. The regression plane, z on xy, is that plane which fits best the 
true regression surface, z on xy. Corresponding statements hold for 
the regression planes of y on xz and of x on yz. 

So far, it was convenient to designate our variables by the con- 
ventional letters used in representing three-dimensional space. We 
are now about to obtain the equations of the regression planes and 
^23 in order to extend our results to h variables 

O it will be desirable to change to a new set 
of symbols which will lend themselves more 
,<2 readily to generalization. The switch will 
cause no difficulty. We shall now use Xi in 
placq^ of z, x<z in place of x, and xz in place 
2 of y. The relations between the r^s in the 
old notation and the new are r^y = ^23, 
Tyz = ri3, r^z — ri2. The adjacent diagram 
will help us keep in mind the relations between the new s3nnbols 
and the old. 

We shall now derive the equation of the regression plane of Xi 
on X2, and xz. In determining, under a least-squares criterion, the 
parameters in its equation it will simplify the exposition if we assume 
that the variables are measured from their respective means as origin. 
This may be assumed without loss of generality. Let the desired 
equation be of the form 

(15) ail = Ax% + Bxz + C. 
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Then we may determine the parameters in (15) so that the sum of 
the squares of the residuals 

(16) U = X^(a:i - Ax 2 - Bxz - C)J 

1,2,3 

is a minimum, / being short for f{xi, Xi, xi), and X for 

1,2,3 Xi X2 Zs 

Equating to zero the first partial derivatives of U with respect to 
Aj B, and C, we obtain the equations 

J^X 2 {Xi - Ax 2 - Bxz - C)f = 0, 

Y^Xz{xi — Ax 2 — Bxz - C)f = 0, 

(7 = 0. 

The simplification of the last equation is a consequence of our choice 
of origin since ^Xif = ^x^S = ^xzf = 0 when the origin of Xi 
is at the mean of its N values. The first two equations may be 
written in the form 

. . j + Bj^X2Xzf = J2xiX2f, 

Let <Ti^ be the variance of Xi and let ra be the correlation coefficient 
between Xi and Xj. Then by definition, 

^Xi^fixi, X2, Xz) = Nai^j 
J^XiXjf(xi, X 2 , Xz) = N(ri(Tfii, 


So (17) becomes 
(18) 


J NA(X^ -f- iVjBor 2 <r 37*23 == NcFi<r 2 Ti 2 j 
I i\rAo' 20 ' 37*23 Hh NBcz^ — NcicrzTiz. 


Solving for A and B we have 



ri2 

riz 

1 

r23 


^23 

^23 

1 



<T3 


1 

^23 

1 

^23 


7*12 

III 

r23 

1 
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It is convenient both for simplicity and for the purpose of general- 
izing to k variables to define the determinant R by 



rn 

ri2 

riz 

R = 

r2i 

r22 

r28 


rzi 

rz2 

rzz 


and to let be the cofactor of that is, the minor of Vij including 
the sign factor ( - 1) Thus, 


Ru = 


r2i 

Tzi 


nz 

rzz 


Riiz = 


r2i 

r22 

Tzi 

rz2 


Clearly, rn = r 22 = rzz = 1, and ri 2 = r 2 i, etc., so the expressions 
for A and B may be written 


A = 
B = 


(^iRi2 

(T2R1I 

(XlRlz 

(XzRll 


Hence (15) becomes 


( 19 ) 


-Rn + 
0-1 


£2 

cr2 


i?12 H ^13 


= 0 . 


This equation gives the most probable value of xi for assigned values 
of X 2 and Xz, provided that the true regression is not far from being 
linear and the distribution of each Xi column is nearly symmetrical 
so that its mode is close to its mean. It is an important equation 
because it shows how, on the average, changes in X 2 and Xz affect Xi. 
The student will observe that the E’s involve only simple correlation 
coeflELcien,ts and that all the necessary computations for the terms in 
(19) were explained in Part I. 

There are two analogous equations for the regression planes of X 2 
on XiXz, and Xz on XiX 2 j which can be obtained readily from (19) by a 
cyclical permutation of the subscripts on x and R, They are 
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when Xi is the dependent variable, and 


(21) — Rzz H RzX + ~ ^^32 = 0 

<Tl (T2 


when Xz is the dependent variable. Referred to an arbitrary origin 
(19) would have been 


(19a) 




<^2 





Riz = 


0 , 


where Xi — Xi = Xi, Analogous adjustments of (20) and (21) are 
obvious when the variables are referred to an arbitrary origin. 

The three-dimensional case can now be generalized. By methods 
similar to those employed in deriving (15) we can derive the linear 
regression equation for k variables. Thus we have the hyperplane Xi 
on X 2 , Xzy ' • Xk, 


(22) — Rii -| i?i2 Rxk = 0, 

cTi cr2 <r^. 


where Ra is the cofactor of Vij in 




Til .. . 

. . Tik 

(23) 

R = 

. , r22 . 

. • • * 



Tkl ... 

. * '^kk 


When expressed in standard units, (22) becomes 

1 * 

(22a) ti= - — Y^RiiU, 

till i=2 

where U = Xi/cn, Then h may be regarded as a weighted mean of 
the contributions of the other variables. The factor Ru represents 
the force or weight of U when all these variables are given an oppor- 
tunity to predict the value of ^i. 

3. Standard Error of Estimate. In Part I (Chapter VIII) we 
learned that Sy^ = <Xy^{l — r^) was a measure of the closeness with 
which the means of the x arrays of y clustered about the line of re- 
gression of y on X. Sy was called the standard error of estimate and 
the larger r was, the smaller was Sy, We now seek an analogous 
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expression for the three-variable case. To this end let 
(24) Sx.ii‘ = ^ Xi, Xi) 

-iV 1,2,3 


where h is the distance, measured parallel to the xi-axis, between the 
regression plane and the points (a;i, X 2 , ^ denoting a summation 

1,2,3 

over all these points. That is, b — (observed Xi — estimated o^i), 
the estimated Xi being given by (19). Then we may write 


E (fin - + - + fiia ^Y/ 

It 11 \ <^1 er2 o's/ 

= ^ + 2fiiiEi2ri2 

xiir 

+ 2i2iii?i3ri3 + ^Ri2RibT2z) 
“ 1d~7 {'^n(-Rii + ri2J?i2 + risiJis) + Ri2{Ru + ri2i?ii + T2zRiz) 

+ RuiRiz + TuRii + r2sfii2)}. 


According to Laplace^s development of a determinant, the elements 
of any row (or column) and their corresponding cofactors may be 
used to develop R. If, in the resulting expression, the elements 
of this row (or column) are replaced by the corresponding elements 
of some other row (or column) the expression vanishes. Therefore, 
we have 


(25) 


Rii + tuRu + rizRu = Rj 

(a) 

Ri2 + TnRii + r2zRiz = 0, 


Riz + rizRii + r 2 zRi 2 = 0. 

ic) 


Using (25) in the above derivation we obtain 


(26) 


aSi.23^ 


(Xi^R 

Rii 


This is a kind of average variance in a:i columns of the observed values 
of Xi from its corresponding estimated values on the regression 
plane (19). The square root of (26), 


( 26 a) 



is called the standard error of estimating Xi from assigned values of 
X 2 and 
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4. Standard Deviation of Estimated Values. Next, we shall 
obtain an expression analogous to (Tsy of Part I (§ 7, Chapter VIII) 
for the standard deviation of the estimated values given by (19). 
The mean value of these estimates is zero since a;,- is measured from 
its mean as origin. Therefore, the variance, crsi^ of the estimated 
values of Xi is given by 


(27) raj/ 


~ (-^ 12 ^ + + 2RnRizr2^ 

Rvr 


<ri^ 


<rr 


{Ri2{Ri2 + Riz'1^23) + RlsiRu + ^2l2^23) } 


I'll 

•,2 


{—Ri 2 Rxiri 2 — RuRiiTn} by (6) and (c) of (25) 
R) by (a) of (25) 


Xtll 


Hence we have 
(28) 


CTeI = o*i{l 

\ -^11. 


) l/2 

' 


If this result is to correspond to <TEy = cryV we would expect that 
the factor (1 — R/RuY^^ would correspond in some way with r. 
This is indeed the case and we shall now show that this factor is the 
formula for the multiple correlation coefficient of Xi on X 2 and 2 : 3 . 

6. Multiple Correlation Coefficient. The ordinary correlation co- 
efficient between the observed values of Xi and its corresponding 
estimated values calculated from (19) is called the multiple correla- 
tion coefficient of xi on 2^2 and 2 : 3 . It is denoted by ri. 23 , so we have 

So^l JS^l 
ri.23 = ' 


where oXi and nXi denote the observed and estimated values, respec- 
tively, of Xi. 
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Using (19) this may be written in the form 

__ o f ^12 ^2 ^13 

NaiarEiri.2z = (Ti^Z^ “ I “ 7 / 

(yi\ xin (72 itiiCTa/ 


Nai^ 

Rii 


(—Rnrn - Risris) 


=^~iRn-R) 

Kn 


Making use of (28) in the above result we have the required formula 



By a cyclical permutation of the subscripts we can write at once the 
formulas for the multiple correlation coefficients of on Xi and xs, 
and of Xz on Xi and X 2 . They are 



By writing (26) in the form 



we obtain the formula 

(32) (Ti^d ~ ri. 232 ) 

which is quite analogous to the expression for Sy^ in simple correla- 
tion. It is clear from (32) that 

( 33 ) — 1 ^ 1.23 ^ !• 

Each of the formulas (29), (30), and (31) may be generalized for 
k variables. Thus the multiple correlation coefficient of order fc — 1 
of Xi with the other — 1 variables is 


R 


1/2 
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where now Ra is the cofactor of r*,- of R as defined in (23). While a 
mathematical generalization gives a more complete and aesthetic 
presentation, it is seldom that (22) or (34) are of value in practical 
cases for more than four variables. 

For computing purposes it is pleasant to know that multiple 
correlation coeflacients are expressible in terms of simple correlation 
coefiS-cients. 


Example 1. Three variables have in pairs simple correlation coefficients given 
by 

ri2 = .8, ri3 = ■—.7, r23 == — .9. 


Find the multiple correlation coefficient ri .23 of xi on X 2 and Xs. 


Solution. 


1 .8 -.7 

22 = 1 .8 1 -.9 I = .068 

-.7 -.9 1 

22ii = .19, ^1.23 — .8013. 


Example 2. Suppose it is found that = .6, vn ~ —.4, r 23 = .7. Comment 
on these results. 

Solution. R — —.346, i2ii = -51, ri .23 = 1-29. Inspecting the given r’s we 
observe that large values of Xi are associated with large values of Xi, but since 
ri 3 is negative it would mean that small values of xi go with large values of 0:3 
which is impossible when ri 2 and r 2 z are positive. 


6. Limiting Cases. The following theorems are interesting in 
themselves and shed light on interpretations of the theory in applica- 
tions. 

Theorem I. The necessary and sufficient condition for coincidence 
of the three regression planes (19), (20), and (21), is 

(35) ris^ + ris^ + r23^ ^ri^ru^z = 1. 

Proof. From elementary analytic geometry, we know that a 
necessary and sufficient condition that two equations of the first 
degree represent the same plane is that their coefficients be propor- 
tional. For our equations this will be true when 

Rii _ Ri2 __ R\z 
Ri 2 R22 R2Z 


Ri 2 R22 R2Z 
Riz R2Z Rzz 


and 
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When expressed in terms of th these relations, it will be found, all 
satisfy (35). 

An alternate proof is as follows. When = 0 there is perfect 
functional dependence between the variables, assuming linear regres- 
sion. It is evident from (26) that Si. 23^ = 0 when E = 0. Upon 
expanding R in terms of Vij and equating the result to zero we ob- 
tain (35). 

Corollary. Assuming linear regression^ the criterion for perfect 
correlation between three variables is given by (35). 

Example 3. Given the following data, ri 2 = .6, ris = .4. Find the value of 
r 23 in order that ri.23 = 1. 

Solution. Substituting the given values in (35) we have 

— .48r — .48 = 0, 

where the subscripts are dropped for the moment. Solving, we find r = .24 
i .73. So 7*23 “ .97. 

The example shows that even though ri2 and ris are individually 
small, it does not follow that there cannot be high correlation between 
oji, X2, and 0^3. Indeed two variables which individually with a third 
variable have correlations which are apparently worthless for pre- 
dicting purposes may be very valuable when the three variables are 
taken together and multiple regression employed. On the other 
hand, it may be possible to get as good a prediction from ri2 or ris 
using simple regression as from multiple regression. This situation 
will be clarified by the following theorems. 

Theorem 11. If r^s = 1, then ri.23^ = and Si.n^ = 

cri2(l - ri2^). 

Proof. When r23 = 1 then Ru == 0 and it would appear from 
(29) that ri.23 then becomes infinite. But this is impossible by (33). 
When ^23 = 1 it will also happen that ri2 = fis. The student can 
easily verify this by letting 7*23 = 1 in (25) and subtracting (c) from 
(6) there. So we shall first see what (29) becomes when r^ = ri2. 
If in (a) of (25) we let ri3 = ri2 we obtain R — Rn = 2ri2Ruy since 
Riz then equals E12. Substituting this result in (29) we soon have 

, -2ri2Ei2 2ri2^ 

n.232 = — = 7-7 — ^ 

Ell 1 + ^23 

remembering that rn = ri2. Now if we let ^23 = 1 in the last 
expression we obtain the first conclusion of the theorem. The second 
conclusion follows front the first and formula (32). 
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In this case, then, multiple regression has no advantage over the 
simple regression xi on 0:2 or Xi on Xzj because the standard error is 
exactly what it would be if the third variable were not added. Since 
r23 = I7 there is perfect linear dependence between X2 and X3, Geo- 
metrically, all the data lie in the regression plane. 



Theorem III. When m = 0 then = ri2^ + 

Proof. When r^z = 0 it is easy to show that Rn = 1 and R = 
1 — ri2^ ~ ri3^. So from ( 29 ) we have 

ri.23^ = ri2^ + 7 * 1 3^. 

The formula for the standard error of prediction then becomes 

~ ri2^ ~ ris^). 

Hence, when X2 and Xz are completely independent, multiple regres- 
sion gives a better prediction than would be given by either of the 
simple regressions Xi on X2 or Xi on Xz; very much better if also ri2 
and ri3 are nearly equal. If they are exactly equal their maximum 
value is == . 707 . This theorem shows that one has a good 
regression equation for predicting when each of two variables is 
highly correlated with the third variable but not with each other. 

7 . Partial Correlation. It is often important to measure correla- 
tion between two variables when the other variables have assigned 
values. For the case of three variables, to which we limit our atten- 
tion, consider a slab parallel to the X1X2 plane (Figure 13 ). This is 
a sub-set of N which forms a two-way correlation table in which 
the relations between Xi and X2 hold for a fixed value of Xz. The 
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correlation coefficient between Xi and X2 in this sub-distribution is 
called the partial correlation coefficient between xi and X2 for the 
assigned Xz and is conventionally denoted by 

ri2.3. 

The regression curves for the table consisting of this sub-distribution 
are called the partial regression curves. A classical example of a 
partial correlation coefficient is the correlation between statures of 
fathers and sons when the stature of the mother is a particular value, 
say 62 inches. 

In order to express ri2.3 in terms of the total correlations ri/, as we 
were able to do in the case of ri.23, it will be necessary to assume a 
theoretical or ideal situation. Suppose we are dealing with a distri- 
bution for which the total regression curves are straight lines and 
the regression surfaces are planes. Then the partial regression line, 
Xi on 0^2, in our table at Xz will be a section of the regression plane, 
Xi on X2XZ, because the line will contain the mean points of all 
the Xi columns, defined by the points xs), which lie in the table 
at xz^ 

In the two variable case, described in Part I, we learned that 
Sy^ was an average of the variances in the x arrays of y taken over 
all the values of x. Moreover, when the distribution was normal 
we proved that these variance^ were constant and Sy'^ was precisely 
this constant variance. The three variable case, in the ideal distri- 
bution we are about to consider, is quite analogous. Recall that 
&.23^ could be regarded, in the ordinary case of linear regression, 
as an average of the variances of Xi in the several columns at {x^, Xz) 
since, when regression is linear, the means of the columns lie on the 
regression plane. Now let us assume that the distribution is homo- 
scedastic in the Xi direction so that the variances in all the columns 
of xi are the same. Under these assumptions, >Si.23^ is the variance 
in each column of XiS. Let 0*1.3^ be the variance of Xi in the table 
at Xz, Remember that r^.z is the correlation coefficient in this 
table and that regression is linear and homoscedastic. Therefore, 
for the variance /Si.23^ in each of the columns of this table we may 
write 

(36) >Si.23^ = ax.zKl - ri2.3^). 

Now consider the two-way table of totals fixi, Xz), In this table, 
ri3 is the total correlation between Xi and xz, and 0-1.3^ is the variance 
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ia an X3 array of ki’s. Since, under our assumption, 0-1.3^ is constant 
over these arrays, we may write 

( 37 ) 0-1.32 = o-i 2 (l — m^). 

From ( 32 ), ( 36 ), and ( 37 ) we obtain 


(1 - n. 33 ') = (1 - n3^)(l - ri2.32) 


that is 


R 

= i?22(l ^2.3^)* 

■^11 


Solving, we have 




R11R22 — R 
R11R22 


By expanding the R^s it is readily verified that 
R11R22 — == ( — Ru)^ 

is an identity. Therefore, we have the final result 


(38) 


ri 2.3 = 


{RiiR22y^^' 


This may be written, if desired, in the form 


(38a) 


^ 12.3 = 


^12 ^ 137'23 


By letting sin 0 = r, it is seen that tables of cos 6 = (1 — will 
facilitate the computation of ( 38 a) in numerical problems. 

Since ( 38 a) does not involve Xs, the value of ri2.3 for one assign- 
ment of Xs is the same as for any other assigned value of xs. Therefore, 
not only must the distribution be homoscedastic in the Xi direction, 
but also the value of ri2.3 in all slabs perpendicular to the xs-axis 
must be the same. It is fairly obvious that these conditions would 
not, ordinarily, be satisfied in practical applications. So, in the 
applications, ri2.3 is regarded as a sort of average value of the partial 
correlations which could be obtained for all assignments of x^. The 
chief use of partial correlation is in testing what the correlation 
between two variables would be if the third variable were not inter- 
fering with the relationship. 
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Example 4. In a study of the factors which influence “ academic success,” 
May * obtained the following results (among others) based on the records of 450 
students at Syracuse University. 


Xi = honor points, Xz = general intelligence, 

Xi = 18.5, X2 = 100.6, 

<ri = 11.2, (72 = 15.8, 

7*12 = .60, ^13. ~ .32, 


Xz = hours of study, 
Xz = 24, 

(73 = 6, 

7*33 = — , 35 , 


One purpose of the study was to find to what extent honor points were related 
to general intelligence, when hours of study (per week) are held constant. Using 
(38a) it is found that ri 2. 3 = .802. 


8. An Alternate Derivation. It is useful to approach the subject 
of partial correlation from another point of view. Assume, as before, 
that the variables Xij X2, Xz, are referred to the general mean as origin. 
Suppose that we wish to know what the correlation between xi and X2 
would be if the influence of 0:3 were eliminated. Let us subtract from 
the Xi of each point that part of Xi which is due to the influence of 0:3 
as indicated by the regression line Xi on 0:3 and denote the residual 
by xi.z. Then subtract from the 0^2 of each point, that part of X2 
which is due to xz as indicated by the regression line X2 on 0:3 and 
denote the residual by 0^2.3. Thus we have 


(39) 


cri 

Xi.z = Xi - ri3“-a;3, 
(TS 

(T 2 

X2.Z = X2 — nz'—Xz. 

crz 


We shall now prove that the simple correlation coefficient between 
0^1.3 and X2.3 imprecisely ri2.3. By definition, this simple correlation 


coefficient would be 
(40) 


^Xi .sXi.zfjxi, Xi, xs) 
Na'i.ia'2.3 


Making use of (39), the numerator of (40) becomes 

- riz—'^XiXsf — 7 - 23 — '^XiXsf + 

0"3 0*3 <Xz 

= NiffiaiTn — ffitriruTis — + criffiriaris) 

= NaKTiiru — nzrzs). 

* Predicting Academic Success — Mark A. May, Journal Educational Psy- 
chology, 1923, vol. 14, 7, 429-440. 


V 
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Now by (37), 


and similarly, 


<71.3 = <7i(l — 

<72.3 = <72(1 r 2 Z ^ y ^^. 


Inserting these results in (40) we obtain the promised result 


ri2 - rizT^z 

{(1 ~ n3^)(l - r2z^)V^^ 

When interpreted according to this derivation, ri 2.3 is sometimes 
called the “ net ’’ correlation between Xi and X 2 . 

Interesting interpretations of multiple and partial correlation 
in terms of spherical trigonometry will be found in the following 
references: 

1. Burgess, The Mathematics of Statistics, pp. 266-267; Houghton Mi fflin Co. 

2. Jackson, The Trigonometry of Correlation, Journal of the American Mathe- 

matical Association, vol. 31, pp. 275-280. 


Exercises 

1. Find the multiple correlation coefficients and the regression equations for 

the data in Example 4. 

2. (Garrett) The r for intelligence and school achievement in a group of 

children 8 to 14 years old is .80. The r for intelligence and age in the 
same group is .70. The r for school achievement and age is .60. What 
will be the correlation between intelligence and school achievement in 
children of the same age? 

3* (Yule and Kendall) The following means, standard deviations, and cor- 
relations are foxmd for 

Xi = seed-hay crops in cwts. per acre, 

Z 2 = spring rainfall in inches, 

Xz = accumulated temperature above 42® F. in spring, 
in a (sertain district in England during 20 years. 

Xi = 28.02, (Ti = 4.42, ri2 = .80, 

X 2 = 4.91, 0-2 = 1.10, 5*13 = -.40, 

Xz = 594, o'z - 85, 7*23 = —.56. 

Find the partial correlations and the regression equation for hay crop on 
spring rainfall and accumulated temperature. 

4. Derive and explain the relation a-i^ = +^ 1 , 23 ^. What is the corre- 

sponding relation in simple correlation? 
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5. The following data relate to land values and crops in twenty-five Iowa 
counties. 

Xi = average value per acre of farm land on January 1, 1920, 

X 2 - average yield of corn per acre in bushels 1910-1919, 

Xs — per cent of farm land in small grain, 

Xi = per cent of farm land in corn. 


County No, 

Zi 

X2 

Zs 

X4 

1 

$ 87 

40 

11 

14 

2 

.133 

36 

13 

30 

3 

174 

34 

19 

30 

4 

385 

41 

33 

39 

5 

363 

39 

25 

33 

6 

274 

42 

23 

34 

7 

235 

40 

22 

37 

8 

104 

31 

9 

20 

9 

141 

36 

13 

27 

10 

208 

34 

17 

40 

11 

115 

30 

18 

19 

12 

271 

40 

23 

31 

13 

163 

37 

14 

25 

14 

193 

41 

13 

28 

15 

203 

38 

24 

31 

16 

279 

38 

31 

35 

17 

179 

24 

16 

26 

18 

244 

45 

19 

34 

19 

165 

34 

20 

30 

20 

257 

40 

30 

38 

21 

252 

41 

22 

35 

22 

280 

42 

21 

41 

23 

167 

35 

16 

.23 

24 

168 

33 

18 

24 

25 

115 

36 

18 

21 


(a) Find the linear regression equation of Xi on Z2X3Z4. 

(5) Estimate the first five values of Xi, using the equation obtained in (a), 
(c) Calculate /Si. 234 and n. 284. 


CHAPTER VI 


FUNDAMENTALS OF SAMPLING THEORY WITH SPECIAL 
REFERENCE TO THE MEAN 

1. Introduction.* To emphasize the viewpoint of the subject of 
this chapter it is convenient to recognize two general classes of prob- 
lems in mathematical statistics. In problems of the first class our 
concern is largely with the exposition of methods of characterizing 
observed data. Thus in the first class would fall methods for sum- 
marizing the pertinent information in a set of variates by means of 
averages, measures of dispersion, indices of correlation, etc. In 
problems of the second class, however, the data at hand are regarded 
as a random sample drawn from a well-defined class of variates called 
the population or universe of discourse, and we are concerned with 
drawing inferences about the universe from the sample. By a sample, 
more precisely a random sample, we mean a sub-set of variates in 
which each individual from the universe has an equal and independent 
chance to be included. From this chosen sample we attempt to draw 
inferences concerning the universe. In order to deal with this induc- 
tive argument we first consider a deductive argument; that is, 
we first consider an infinite (or finite) universe and investigate the 
behavior of samples according to the laws of probability. The 
methodology dealing with this class of problems is known as sampling 
theory. Although the two classes of problems are not entirely dis- 
tinct with regard to their treatment, the center of interest in sampling 
theory is the development of criteria for assisting common sense or 
educated judgment concerning the magnitude of chance fluctuations 
in statistical ratios, averages, and coefficients. 

The Bernoulli theory deals with sampling fluctuations in relative 
frequencies. In the words of Professor Rietz,^ 

But it is fairly obvious that the interest of the statistician in the effects of 
sampling fluctuations extends far beyond the fluctuations in relative frequencies. 
To illustrate, suppose we calculate any statistical measure such as an arithmetic 

* A reference list is given at the end of each of the following chapters to which 
attention is directed in the course of the discussion by the use of superscripts. 
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mean, median, standard deviation, correlation coefficient, or parameter of a fre- 
quency function from the actual frequencies given by a sample of data. If we 
need then either to form a judgement as to the stability of such results from sample 
to sample or to use the results in drawing inferences about the sampled population, 
the common sense process of induction involved is much aided by a knowledge of 
the general order of magnitude of the sampling discrepancies which may reason- 
ably be expected because of the limited size of the sample from which we have 
calculated our statistical measures. 

A statistical measure calculated from the actual frequencies given 
by a sample has been called a statistic by R. A. Fisher. ^ This is to 
avoid a verbal confusion with the corresponding 'parameter in the 
universe which we should like to know but can generally only esti- 
mate. It is a matter of common experience that a statistic will 
vary from sample to sample. To characterize the variation that 
may be tolerated on the basis of chance is one of the fundamental 
problems of sampling theory. 

In discussing such sampling fluctuations, Fisher^ introduces the 
subject as follows: 

The idea of an infinite population distributed in a frequency distribution in 
respect of one or more characters is fundamental to all statistical work. From 
a limited experience, for example, of individuals of a species, or of the weather 
of a locality, we may obtain some idea of the infinite hypothetical population 
from which our sample is drawn, and so of the probable nature of future samples 
to which our conclusions are to be applied. If a second sample belies this ex- 
pectation we infer that it is, in the language of statistics, drawn from a different 
population; that the treatment to which the second sample of organisms had 
been exposed did in fact make a material difference, or that the climate (or 
methods of measuring it) had materially altered. Critical tests of this kind 
may be called tests of significance, and when such tests are available we may 
discover whether a second sample is or is not significantly different from the first. 

2. Method of Attack. The whole theory of sampling is based on 
frequency distributions and probability. In order to explain the 
tests of significance that have been developed, it is desirable to out- 
line briefly the philosophy underlying the method of attack. 

Sampling theory deals with specific questions like the following: 
Given the mean and standard deviation of a sample of N variates, 
how reliable are these estimates of the population mean and standard 
deviation, respectively? Given two samples, do their respective 
means or other statistics differ significantly? Qan the differences 
be accounted for on the basis of chance or do the samples come from 
different populations? The answers require in general that we con- 
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ceive the universe as one distribution, the values of the statistic 
calculated from all possible samples of size N from that universe 
as another distribution, and that there are mathematical expressions 
capable of representing both distributions. This is the chief reason 
for studying frequency curves and probability distributions. 

Suppose, for example, that we have computed a statistic — say the 
mean of 100 observations or measurements. What we get is not an 
absolutely fixed quantity which may be exactly reproduced again 
by taking 100 similar measurements. Indeed, if such an experiment 
were repeated many times, we would get values for the arithmetic 
mean which would form a frequency distribution. This distribution 
would have its own mean (mean of means), standard deviation, and 
higher moments, v^he law describing the frequency distribution of 
all possible means of samples of size N from a specified universe is 
called a distribution function when it can be expressed mathemati- 
cally. Its graph is called the curve of means. What has been said 
of the mean holds similarly for any other statistic. 

Formulation of statistical judgment about a sample involves the 
specification of the universe and the determination of the distribution 
function of a given statistic in samples of a given size drawn from 
this universe. The problem of determining the distribution functions * 
for the various statistics from specified universes is one which has * 
challenged modern mathematical research. In most cases it has * 
been necessary to assume that the parent universe is of the normal * 
form in order to obtain analytically the sampling distribution of the * 
statistic. Many of the tests of significance are based upon this 
assumption. However, considerable information about sampling 
distributions from arbitrary universes is known in terms of their 
moments or expected values. 

3. Expected Values. Let the continuous variable x be subject to 
the distribution function f(x) and let (t>(x) be an arbitrary function 
of X. Then the expected value of (t>{x)j denoted by application of the 
operator E, is defined by 

p CO 

(1) = / (i>{x)f(x) dx, 

%/ — 03 

provided this integral exists. In particular, if <t>{x) = x’‘, {k = 

1, 2, • • • )> we have 

Six*) = f x’‘f(x) dx. 
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For fc == 1, this defines the mean of the x^s in the universe represented 
by/W. Hereafter we will denote the mean of a universe of x's by x 
and restrict x to denote the mean of a sample from that universe. 
Therefore, we may write * 

(2) E(x) = X, 


If <l}(x) == (x — x^j we have the variance of x, 


= Eix - x)^ 

^ ^ - E(x^) - xK 

The (positive) square root of is called the standard deviation or 
standard error of the distribution of x. Analogous definitions hold, 
of course, for y. 

If the variables x and y are simultaneously distributed in accord 
with the function /(a;, y), then 

X OO pt CO 

J xyfix, y) dy dx. 

If x and y are not independent variables in the probability sense, 
then, as we have seen in Chapter IV, f{x, y) 9 ^ g(x)h(y) where g(x) 
and h{y) are the marginah distributions of x and yj respectively. 
The correlation coefiicient, p, between x and y in the bivariate 
universe represented by /(re, y) is defined by 


(4) 


E{xy) - xy 
P = 

^ y 


The quantities x, <r, p, etc., relating to a universe are called param- 
eters- 

The following propositions may easily be established from pre- 
ceding definitions so they are stated without proof. 

I. The expected value of the product of a variable and a constant is 
equal to the product of the constant and the expected value of the variable. 
That is, 

E(cx) = cE(x), 

II. The expected value of deviations of a variable from its expected 
value is zero. That is, 

E{x — x) = 0. 

* S is read a; 
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III. The expected value of the sum of two or more variables is the sum 
of their expected values. In symbols^ 

E{x + y + z) = E{x) + E{y) + E{z). 

IV. If X and y are mutually independent variables in the probability 
sense, then the expected value of their product is equal to the product of 
their expected values. That is, 

E(xy) = E{x)E{y). 

V. The expected value of the product of deviations of two mutually 
independent variables from their respective expected values is zero. 
That is, 

E{{x - x){y - y)} = 0. 

VI. The expected value of the product of deviations of two correlated 
variables from their respective expected values is given by 

E{(x ^){y y)] — po'xO'y. 

4. Standard Error of a Linear Function of Variables. Suppose 
a variable is a linear function of two or more independent * variables 
each of which may take on a universe of values and we require the 
standard error of this function in terms of certain moments of the 
underlying distributions of independent variables. To this end let 

(5) w = CiXi + C 2 X 2 + • • ‘ + cnXn 

where each variable Xk, (fc = 1, 2, • • • , N), is arbitrarily distributed 
and where the c’s are arbitrary constants. Let ak represent the 
standard error of Xk in the universe to which it belongs, and let p,v 
represent the correlation coej05.cient (if any correlation exists) between 
Xi and Xj. We seek the standard error of w, o-y,, in terms of o-k and 
pij, {i = 1 to V, j = 1 to N). 

Case I. We will suppose first that the variables in the several 
universes are correlated, that is, that p»/ 0 for every combination 

of i and j. From (5) and Proposition III we have 

(6) E{w) = CiE(xi) + c^Eixf) + • • • + cnE{xn)j 
that is 

(7) w ~ ciXi + C 2 X 2 -j- • • • 4“ 

* We are using the phrase independent variables here in the ordinary 
sense of analysis to designate the variables on which a special function depends, 
without any implication that these variables are independent of each other in 
the statistical sense. 
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Then 

E(w - w)2 = J^Ci^Eixi - Xi)^ + Y^CiCiE(xi - x,)(xi ~ xj) 

ij^j 

which by definition (3) and Proposition VI beccmes 

(8) (r„2 = 

If Cl = 1, C 2 = dbl, and iV == 2, we have as a special case 

(9) 0*^2 = (Ti^ ± 2Pi2crior2 + 0*22. 

Case 11. Suppose the x’s in (5) are mutually independent in the 
statistical sense so that p »7 = 0. Then (8) becomes 

(10) = Ci^cTi^ + C 2 W + * • • + 

5. Theorems. Relations (6)“(10) enable us to prove some in- 
teresting and useful theorems about the distribution of means of 
samples from an arbitrary universe. The following definition will 
make the notion of sample precise. 

Definition. Let (xij 0 : 2 , • • * , x^r) be a set of N independent vari- 
ables each subject to the same distribution function g, so that their joint 
distribution function is 

f{xi, 0 ^ 2 , * • • , xn) - g(xi)g{x2) * • • g{xu). 

Then {xi, X 2 , • • ’ ^ xn) is called a random sample of N from a universe 
with distribution function g(x). 

Table 6 exhibits the notation which will be used for the moments 
of the several distributions referred to in Theorems I~III. 


Table 6. Notation 



! 

Universe 

Sample 

Distribution of 
Means 

Mean 


X 

Bix) = % 

Standard Deviation 

(Tx 

s 

<^x 

Variance 




Skewness 

«3;x 

OiZ:z 

^s:x 

Kurtosis 

^4;x 

a4;x 



Theorem I. If samples of size N be drawn from an arbitrary 
universe and if ^ be the mean of a sample, then the mean of all possible 
such means equals the mean of the universe. That is, 

(11) £(3c)=f. 
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Proof. In (5), let Ci = C 2 = • • • = cn = l/N and let Xi, 0 : 2 , • • • , 
xn, constitute a sample from a universe with mean x and variance 
Then w = x. As a consequence of the definition of sample, 
E{xi) = X for each value of i from one to A. Therefore, (6) gives us 
E{x) = X. 

Theorem II. The variance of the sampling distribution of means 
from an arbitrary universe equals the variance of the universe divided 
by the number in the samples. In symbols, 


( 12 ) 

Hence, 

(12a) 



(^X = 


{Nyn 


Proof. As in the proof of Theorem I let -u? = x. Then (10) 
becomes 

(13) 

3ince the x^s constitute a sample, cr<2 = for each value of i from 
1 to N. So (13) reduces to (12). 

Theorem HI. The moments describing skewmss and kurtosis in the 
sampling distribution of means are related to the corresponding moments 
in the universe by the following formulas: 


(14) 



^Z:x 

Vn' 


54:5 == 3 + — (54;x — 3). 


A proof of (14) could be given by developing and applying addi- 
tional propositions on expected values. However, this method is 
tedious for the higher moments. A more elegant proof can be given 
by means of characteristic functions.^ Such a proof has been made 
available by Shewhart^ for the discrete case. 

The first and second theorems show us that in repeated samples, 
X is distributed about x with standard deviation Theorem 

III tells us something about the form of the distribution. Thus if 
the universe is normal so that as:® = 0 and 54:® = then from (14) 
we see that az^x — 0 and 54-,® = 3, so the sampling distribution of x 
fromanormal universe has the normal values forskewness and kurtosis. 
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In the next three theorems it will be understood that x and y are 
correlated variables which are jointly distributed in accord with an 
arbitrary function j{x, y) in which the parameters are x, y, Cx, o-j,, 
and p. 

Theorem IV. Lei {xi, yi), {x-t, 2/2), • ■ • , {xn, yn), be a sample of 
N pairs drawn independently from the distribution characterized by 
f(x, y) and let (x, y) be the mean of a sample. The correlation coefficient, 
B, between the means of all possible such samples equals p. 

Proof. By definition, 


(15) 


„ E(xy) - xy 
rC j 

y 


and 

E{xy) = + a;2 + • • • + XN)iyi + 2/2 + • • • + j/w)} 

_ EiS) 

m ’ 

where 


S = xiyi + Xiyi + h XiyN + 

x%yi + X2y% + • • • + x^yN + 

* + 

XNyi + x^y^ 4. ... 4- a;iv2/jv. 


We will separate S into two parts, conveniently called u and 2;, where 

u = xiyi + x^y^ + • • • + XNyNy 
and 

V = sum of — N) terms of the form Xiyj, i 9^ j. 

Then 

E{u) = E{J^{x(yi)} = J^{E{xiyi)} = NE{xy) ^ 

1 1 

In V, Xi must be uncorrelated with y^ since i 5^ j. Therefore, 

E{xiyj) = E{xi)E{yj) = xy, 
and 

E{v) - {m - N)xy, 

So we have 

EiS) = NEixy) + iN^ ~ N)xy, 
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and therefore 

(16) E{xf) = ^ \E{xy) + {N - l)^}. 

Making use of Theorem II and (16) the right member of (15) reduces 
to the definition of p. 

Theorem V. Let x be the mean of a sample of N from g{x) and let 
y he the mean of a sample of N from h{y) where g(x) and h{y) are the 
marginal distributions of the universe characterized by f(x, y) of corre-- 
lated variables. Let w = x — y. The variance of the sampling 
distribution of w is 

(17) — '2p<Xx<ry + CTy^). 


The proof follows from (9) and Theorem IV. 

Theorem VI. Let x and y he the meanSj Sx and Sy the standard 
deviations, and r the correlation coefficient in a sample of N correlated 
items. Suppose N is so large that s^ is a good estimate * of cr^ and r of p, 
so that we may write 


<Tx" 


N ' 



p = 


r. 


The variance of the sampling distribution of w = x — y may be com- 
puted from the sample by the formula 

( 18 ) - Vi' - 

The proof follows from (17), 

6. An Experiment. We will now describe an exercise in experi- 
mental sampling which will help make the theory more meaningful. 
It was performed by a class of thirty students who took the distri- 
bution of Table 7 as a universe.'" 

In a box were placed 2000 discs t each bearing a number from the 
set 1, 2, 3, • • • , 25. The numbers on the discs were coded to the 

* The problem of estimation is discussed in the next chapter, 
t Small metal rimmed price tags were used. Ideally, each individual disc 
should be returned to the box before the next is drawn. However, this was not 
insisted upon and an entire sample may have been drawn before replace- 
ment. 
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Table 7. Span among Adult Males. (See Table 20, Part I) 


X 

/ 

58.5 

1 

59.5 

2 

60.5 

1 

61.5 

6 

62.5 

7 

63.5 

22 

64.5 

55 

65.5 

111 

66:5 

146 

67.. 5 

182 

68.5 

229 

69.5 

265 

70.5 

263 

7r.5 

217 

72’. 5 

176 

73*. 5 

132 

74’. 5 

82 

75*. 5 

48 

70.5 

20 

77.5 

16 

78.5 

12 

79.5 

3 

80.5 

1 

81*. 5 

2 

82*. 5 

1 


span values in accordance with the scheme shown on page 107, and 
the frequency of the variously numbered discs equaled the fre- 
quency of the corresponding x’s. Each member of the class drew 
samples from the box according to the following directions. 

Directions 

1. Intermix the discs thoroughly and withdraw’' four random 
samples of ten discs each. 

2. Record the numbers in each sample of ten on the sampling record 
sheet (page 107) ; replace the discs in the box. 

3. For each sample of ten: find (a) mean span, (b) variance, (c) 
standard deviation. 

4. Combine the four samples into a single sample of forty and 
find the statistics named in 3. 
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Sampling Record Sheet 



* In computing the statistics let x denote span and u the number on a disc. Then u =» x — 57.5, 
5 -f 57.5, and 

The results of 3(a) will be reproduced here. There were, of 
course, 120 means from samples of N = 10. These were then 
grouped into a frequency distribution. The resulting distribution 
and its moments, together with the moments of the universe, are 
given in Table 8. (The computations were made according to the 
definitions given in Part I for the moments of an observed distri- 
bution.) 

Although the chief purpose of the experiment is an appreciation 
of the theory, it is of interest to compare the experimental and 
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Table 8. Distribution of the Means of 120 Samples of N = 10 Drawn 
FROM THE Universe of Span 


Interval 

Mid X 

Frequency 

Moments 

67.0-67.3 

67.15 

1 ' 

Mean x = 69.785 

67.4-67.7 

67.8-68.1 

67.55 

67.95 

1 

4 

Si = 0.894] 

68.2-68.5 

68.35 

4 

0.052 

68.6-68.9 

68.75 

5 

“4;i = 3.030 

69.0-69.3 

69.15 

19 

69.4-69.7 

69.55 

27 


69.8-70.1 

69.95 

20 


70.2-70.5 

70.35 

20 

X = 69.943 

70.6-70.9 

70.75 

7 

<7, = 3.115 

71.0-71.3 

71.15 

6 

3s;» = 0.161 

71.4-71.7 

71.55 

3 

71.8-72.1 

71.95 

3 

a4:x = 3.296 


theoretical results. According to Theorem I the mean should be 
69.943; we obtained 69.785. According to Theorem II the stand- 
ard deviation should be 3.115/(10)^/^ .985; we obtained .894. 

It is left as an exercise for the student to verify that the approxi- 
mations of the o:’s are also close. 

We may think of this universe as approximating a Type III 
curve and the distribution of Table 8 as approximating its sampling 



Fig. 18. Depicting the Sampling Distribution of Meaji^s 
FROM A Type III Universe 

curve of means (Figure 18). To represent graphically a universe 
and the curve of sample means from that universe would require 
analytical expressions for both these distributions. As yet, neither 
a type of universe has been specified nor has the functional form 
of the curve of means from that universe been determined. How- 
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ever, Figure 18 will help the student appreciate the meaning of some 
of the moment relations developed in § 5. 

7. Reproductive Property of Normal Law. An important problem 
is to find the distribution function of the sum of several independent 
variables when these variables are normally distributed. It suffices 
to show how this problem can be solved for the sum of two such 
variables. The following discussion follows closely a proof given 
by Jackson.® 

Let X and y be independent variables and normally distributed 
about zero as mean with standard deviations ai and 0 - 2 , respectively. 
Their distribution functions will have the forms 


the explicit values 1/Ci = <ri(27r)^^^ I/C 2 = (r2(27r)^/2, for total fre- 
quency 1, are not needed at the moment. 

If /(x, y) is the joint distribution function for x and y with marginal 
distributions g{x) and h{y) we shall first show that the frequency 
function, H{w), for the variable w = x + y is 

/ 03 

f{Xy w -- x) dx. 

- CO 

For a <w < when a — x<y<p — x] these inequalities define 
a strip of the (x, 2 /)-plane for which the corresponding frequency is 

/ OO p^~X 

I /(a:, y) dy dx) 

- a— X 


in the integration with respect to y, the substitution u? = a; + 2/? 
y = w — X, makes 


J r»i8 —X 

f(x, y) dy = I Six, w - x) dw, 

a—x ^ a 




=/■/ 


fix, w x) dw dx 



x) dx dw 



dw. 


and hence 
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We can now proceed with the main part of the proof. Since x 
and y are independent, their joint distribution may be written 

/(a:, y) = g(x)hiy) 

and so we have 


yTi 00 

H{w) = I f(x, w — x) dx 

t/ __ CO 

= C1C2 J dx. 


To evaluate this integral write the exponential expression in the form 


where 


ax^ + h{w — rr)2 = (a + b)\x H 

[ a + oj (X “T 0 

= (a + b)z^ + cw^j 

___ bw _ 1 

(X “f- 6 ^ a + 6 2(<ti^ + o‘2^) 


The value of w being regarded as constant for the integration with 
respect to a;, so that incidentally dz = dx, the expression for H(w) 
can be written in the form , 


where 


and 


H(w) = dz 

t/ — 00 

= Ke-‘'^, 

= Ole's/*” 

t/ — CO 


= 01(72 
1 


ct "b & 


1/2 


(r,.(27r)i/2 




2 = ~ == r«-.2 


2c 


(vi=* + va*). 


If X, y, and u are independent and normally distributed, the 
quantity x + y ,+ m can be regarded as the sum of the two inde- 
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pendent normally distributed variables x + y and u, and so is itself 
normally distributed. The conclusion can be carried over by induc- 
tion, without further calculation, to the sum of any finite number 
of variables. Hence we have the following theorem. 

Theorem VII. If xi, 0^2, • * * xnj are independent variables and 
normally . distributed with variances cn^, cr2^, • * • > Ihe function 

N N 

w = normally distributed with variance aw^ = y^.Cj^ai^. 

1 1 — 

The essential feature of the theorem is the part relating to the 
form of the distribution. This rather remarkable property of a 
linear function of normally distributed variables is sometimes called 
the reproductive property of the normal distribution. The part of 
the theorem relating to the magnitude of the variance follows neces- 
sarily from a general formula which was previously established 
without supposing the variables normally distributed or otherwise 
specialized. 

CoROLLAKY. The sampling distribution of means from a normal 
universe is itself normally distributed. The mean of the sampling dis- 
tribution is the same as the mean of the universe and its variance is the 
variance of the universe divided by the size of the sample. 

The proof is left to the student. One should not conclude that it 
is generally true that the means of samples of iNT are distributed 
according to the same type of function which specifies the universe 
irom which they are drawn. But the magnitudes of the mean and 
variance, as given in (11) and (12), are general in the sense that they 
are true for the sampling distribution of means from any infinite 
universe. 

8. Non-Normal Universes. From analyiiic considerations, com- 
paratively little is known at present about the exact distributions of 
statistics for samples drawn from non-normal universes. In a re- 
cent paper, Rietz^ has listed the contributions and summarized the 
progress that has been made in this connection. The reader may 
refer to this paper. 

With regard to the mean, Theorem III tells us that 0 and 

5:4:5 3 as iV ^ 00 . So, even though the universe is far from nor- 

mal, if the sample is made large enough, the sampling distribution 
of X approaches the normal form as characterized by skewness and 
kurtosis. (The conditions ^3 = 0 and 0:4 = 3 are necessary but not 
suiEcient conditions for a normal distribution.) Even for compara- 
tively small values of N there is sufficient experimental evidence to 


112 Mathematics of Statistics 


consider the distribution of x as normal to a high degree of approxi- 
mation. 

Finite Univekses. So far we have assumed that the universe 
was “ infinite,” that is, that it was indefinitely large in all its classes, 
as compared with the sample. This condition could be satisfied 
with a limited supply, for example in the experiment described in 
§ 6, by replacement after each individual draw. However, if the 
entire sample is drawn from a limited supply before replacement, the 
probability of drawing an individual from a given class will be af- 
fected each time that one is drawn from that class. In such a case 
the universe is said to be “ finite.” 

If M is the total frequency of a finite universe, the first four 
moments of the sampling distributions of x are as follows: 


(19) 


E(^) = X 


M -N 
N{M - 1) 


T 2 

J 


- . _ (M - DiM - 2N) , , 

54:5 == 

N{M-2){M-Z)(M-N) 


Their origin is doubtful® They are more general than the formulas 
given in (12) and (14) and reduce to them if M oo . 

The conclusion of investigators is that the distribution of means 
from nearly any finite universe is practically normal. In this con- 
nection the following striking example is given by Carver.® 

A group of students chose arbitrarily the following most unusual 
distribution for a parent universe: 


Table 9 


X 

/ 

15 

9 

3 

2 

29 

43 

405 

189 

1710 

37 

Total 

280 
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N 

and found the distribution of ^Xi = Nx of 1000 samples of twenty- 

1 

five variates each shown in Table 10. It was obtained as follows. 


Table 10 


Class 

/ 

5,000- 

2 

7,000- 

54 

9,000- 

203 

11,000- 

310 

13,000- 

254 

15,000- 

130 

17,000- 

36 

19,000- 

9 

21,000- 

2 

Total 

1000 


Two hundred and eighty Hollerith cards were punched with numbers 
corresponding to the two hundred and eighty variates of the parent 
population. The cards were thoroughly shuffled and then placed 
in a tabulating machine. After twenty-five cards had run through 
the electric tabulator their total was recorded. By repeating this 
procedure one thousand samples were readily obtained. It is thus 
possible to obtain experimentally some appreciation of the sensi- 
tivity of the sampling distribution of means to changes in population 
form. Carver concludes that if the sample N is fifty or larger and 
the population is at least ten times N, the parent population has 
relatively little control over the shape of the distribution of x. 

Another set of experiments was conducted by Shewhart^*^ who 
comes to the following conclusion: 

Such evidence, supported by more rigorous analytical methods beyond the 
scope of the present discussion, leads us to believe that in almost all cases in 
practice we may establish sampling limits for averages of samples of four or more 
upon the basis of normal law theory. 

9. Tchebycheffls Inequality. In (1) replace x by w, let (l>{w) = 
(w — wy, and in the expression fov E{(w — wy} replace all values 
of w larger than ft? + by to + ^<7* and all values of w less than 
to (5cr by to — da- where 5 is a positive number. Then 

E{{m - wy] >^+ {8ayPa 


( 20 ) 
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where 

{w — v>YS{w) dw ^ 0, 

S?— 5tr 

and Ps is the probability that w lies outside the interval {w — So-, 
V) i- 5a). From (20) we have 

( 21 ) 

and therefore the following theorem. 

Theorem yin. The probability is not more than 1/5^ that a value 
of w taken at random from the universe f(w) will differ from its expected 
value by more than a multiple d of its standard deviation. 

This theorem is known variously as Tchebycheff^s theorem, 
criterion, or inequality. A striking property is its independence 
of the nature of the distribution of w. But the gain in generality 
must be paid for and the price is inadequate information about the 
particular. That is, the inequality (21) may be too wide to be of 
practical value in passing judgments on sampling fluctuations in a 
known or proposed distribution. Nevertheless, it does have some 
useful applications, two of which will now be given. 

10. Law of Large Numbers. The Bernoulli theorem (Chapter I, 
§ 7) can now be established. Let w = x/s, x being the number of suc- 
cesses in s trials. Then w = p. Let P 5 be the probability that 
x/s lies outside the interval (p — e, p + e), where e > 0. We may 
take € ~ 8 (pq/sy^^, a multiple of the standard deviation of the 
relative frequency x/s. Accordingly, by Theorem VIII we have 



Since 

1 (pq/sy^^ 

we obtain the inequality 

^ p(i - p) 

j For any assigned e, P 5 can be made arbitrarily small by increasing s. 
VThus x/s becomes increasingly reliable as an estimate of p as s 
increases. 
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The inequality of Tchebycheff can also be used to prove the sta- 
bility of the means of large samples. Consider a sample of N from 
f(x) in which the variance is cr\ Let w he a, linear function of the 
sample defined by 

iTi + ^2 + * * * + 


Suppose is a constant such that Since w? == x, we have 

2 

<£!. 

"■'iV 

Let P be the probability that (x — x)^ > That is, P is the 
probability that 

Nh^ , 

>— OTto^. 

Therefore, from Theorem VIII, 


P < 


Nh^ 


Since c and h are fixed, P can be made arbitrarily small by taking N 
sufficiently large. Hence we have the following theorem. 

Theorem IX. The probability that the mean of a sample of N variates 
will differ numerically by more than a given positive number h from the 
mean of the universe can be made arbitrarily small by taking N suffi* 
ciently large. 

Under the conditions of the theorem, x is said to converge stochasti-- 
colly to X. This type of convergence, however, should not be con- 
fused with convergence in the sense of analysis. 

11. Probability Scale of Sampling Fluctuations. Now that the 
personae dramatis have been assembled, we can state a theorem 
which tells us what the approximate probability is that the mean of 
a sample will deviate by an assigned amount from a hypothetical 
mean. We are assuming here that Ox is known; the case where (Tx 
is unknown will be discussed later. 
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We know that x is (or tends to normally distributed about x 
with standard deviation = <xJVn. If the distribution of x be 
reduced to standard units by the transformation 

X — X 

( 22 ) 

X 

then we know that t is approximately normally distributed about 
zero with standard deviation of unity. Hence we can refer to a nor- 
mal probability scale for the probability that one would obtain a 
random sample for which x differs from x by as much as |5|, where 
d is expressed in the <xs unit. So we have the following theorem. 

Theorem X. The prohahility Qs that a random sample from an 
infinite universe will have a mean, Xj which will be within an interval 3 
of the mean^ x, of the universe is approximately 

Q,=2r<i>(0 dt, 

Jo 

where 3 is the observed value of t given by (22) and (l>(t) is the normal 
curve. Then Ps = 1 — Qb is the approximate probability that x will 
not be within |5j o/ x. If the universe is normal, Ps gives the exact 
probability. 





Fig. 19. =EZZZ3. Qs is the Peobability foe a Deviation as Small as 
l^l, AND Pg IS THE Probability for a Deviation as Large as |5| 

12. Null Hypothesis and Significance Tests. The rationale under- 
lying sampling theory has been summarized by E. S. Pearson^^ 
as follows: 
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In applying the methods of statistical analysis it is generally our aim to dis- 
criminate between two or more alternative hypotheses regarding the factors 
which have controlled certain observed events, which form what we term a 
sample or samples. If the process is examined in a httie detail it will be found 
that the procedure may be described as follows: 

(a) We define a hypothesis to be tested. 

(5) We choose the criterion (or criteria) whose numerical value, derivable 
from the observations, is most suitable for testing the hypothesis. In 
doing this we recognize that the criterion is not a single-valued expression 
even if the hypothesis be true, but will var^^ from one sample of observa- 
tions to another. 

(c) We therefore refer the observed value of the criterion to this sampling 
distribution — e.g., to a normal probability scale, etc. — and so obtain a 
measure of the likelihood of the hypothesis. 

(d) Finally, if judged on this probability scale the observed criterion is not 
exceptional, we conclude that upon the information available there are no 
grounds for discarding the hypothesis; or if the value prove exceptional 
we consider the possibility of alternative hypotheses. 

An hypothesis which is tested for possible rejection under the 
assumption that it is true has been called by Fisher ^ null hypothesis. 
In other words, null hypothesis refers to a particular form of popula- 
tion distribution which is assumed in considering whether or not a 
sample could reasonably have arisen from the population which, 
in fact, was assumed. If the sample could not reasonably have 
arisen from the population proposed, as measured by a significance 
test, we say that the null hypothesis is refuted for the level of signifi- 
cance adopted. If the significance test yields a verdict of “ not 
significant for the probability level adopted, we say that the null 
hypothesis is not refuted or contradicted at that level. 

It is open to the investigator to be more or less exacting concerning 
the smallness of the probability he would require before he would be 
willing to admit that his test has demonstrated a significant result. 
Good judgment in these matters comes only from much experience 
in the particular field in which the problem occurs. However, it is 
conventional among certain workers to adopt the following rule: 

If Pg > .05, 5 is not significant; 
if Ps < .01, 5 is significant; 
if .05 > Ps> .01, 

our conclusions about d are doubtful and we cannot say with much 
certainty whether the deviation is significant or not until we have addi-- 
tional information. Other workers prefer a more conservative level 
of significance. 


I 
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Example 1. Suppose the mean span of 100 persons is found to be ^ = 70.56 
inches. Does this differ significantly from the mean ^ = 69.943 of the '' uni- 
verse ” with standard deviation ctx = 3.115? Calculating the above test we find 

g ^ = 1.99. Referring to the normal probability scale we find 

3.115/vlW 

the chance of a difference between the observed and hypothetical means as large 
as that noted to be Pa == .0471. Our conclusion is that the given statistic 
X == 70.56 is not exceptional, although it is possible that it came from a different 
universe, that is, in this case a different race of men. 

Example 2. Twelve dice were thrown 26,306 times (Weldon’s data), and a 
throw of 5 or 6 points was reckoned a success. The mean of the observed dis- 
tribution was found to be 4.0524. In tossing a true die the chance of scoring 5 or 
6 is I so the number of dice scoring 5 or 6 should be distributed with frequencies 
proportional to the terms in the expansion (f ■+• -|)“* Therefore, the expected 
mean, on the hypothesis that the dice were true, is sp = 12(|) = 4. Test this 
hypothesis using the difference between the observed and theoretical means as 
a criterion of judgment. 

SoMion, cTx - (spqyi^ = {(12) (|) (|)}i/2 = 1.633 

N - 26,306, 


iVi/2 


= . 010 , 


.0524 

.010 


5.2. 


The probability that a deviation outside 5 = ±5 would happen by chance is 
extremely small so we conclude that the dice were biased. 


13. Size of Sample to Have a Given Reliability. From Theorem X 
we may determine the size iV of a sample such that its mean, x, will 
not differ from x by more than a specified error [s], with a degree of 
certainty equal to a specified probability. 


Example 3. The American Rolling Mill Company investigated^® the life of 
ferrous materials under different corrosive conditions. Data obtained from a 
certain kind of sheet material immersed in Washington tap water showed that 
the average time of failure of such sample was 874.89 days and the standard 
deviation of the time of failure was 85.31 days. There arose the following 
question of practical interest to the research engineer of this company: What 
sample size N must be used in order that for similar test conditions, the prob- 
ability shall be 0.90 that the average time for failure determined from the N 
tests will be in error by not more than 5 per cent of the average of the universe? 

Assuming that 874.89 = 25 and that means of samples of N are distributed 
normally, we may answer this question as follows: The allowable error is 5 per 
cent of 874.89 days or 43.74 days, and this must correspond to a probability of 
0.90. From Theorem X we have 

Qs= 2j^ = .90, 
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that is f == .45, 

Jo 

whence from the tables we find d = 1.645. Hence N is found by solving the 
equation 

1.645 = 43.74, 

Vn 

where ax =85.31. We find N = 10. 


14. Difference in Proportions. In the analysis of data obtained 
by sampling, certain problems occur which relate to the significance 
of apparent differences in proportions. Suppose we have two random 
samples of size Ui and n^j respectively, with Xi individuals of the 
Ui items and X 2 of the items which have a certain character or 
attribute. The question arises as to whether the observed difference 
is merely an accident of sampling or whether a similar difference 
exists in the universe. The following theorem may be used to test 
the null hypothesis that Xi/ui and x^/n^ are random and independent 
samples from the same universe. 

Theorem XI. If Xi/ui and x^/n^ are random and independent 
samples from an infinite universe in which p is the proportion of indi- 
viduals which have the character in question^ the probability that the 
difference in the proportions obtained will be numerically as great as the 
observed difference w = \xi/ni — X 2 /n 2 \ is approximately Ps, where 
Ps is defined in Theorem X, and 



Proof. According to the Bernoulli theory, Xi/ni will vary about 
an expected value p with variance pqfnij where g = 1 ~ p. Simi- 
larly, X 2 /n 2 will vary about p with variance pqln 2 . Then 

and from (10), 


(23) 




ni n2 


Therefore, w varies about zero with variance given by (23), and the 
ratio 


w 

(24) t = - 

varies about zero with unit standard deviation. 
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Information about the form of the t distribution may be obtained 
from its higher moments. It is not difficult to show that 


agS = 


4pg 


(25) 


pq 


(ni — n%Y 
nin%{ni ■+■ 


a4 = 3 + 


Qpq ^ nt 


nin% + ^2^ 


pq 


niu^ini + 


For fixed values of p and g, it is clear that and a 4 3 as the 

samples are taken indefinitely large. Even for moderately small 
samples the distribution of t does not differ greatly from the normal 
form. The following empirical rule, suggested by E. S. Pearson, is 
useful when one is in doubt about the propriety of referring (24) 
to the normal probability scale. 

Rule. Suppose Ui < {we are at liberty to call either ui). If 
nip > 5, the use of the normal probability scale is justified. If 
Uip < 5, examine az^. If az^ < .04, it is still sufficiently accurate. 
But if oLz^ ^ .04, no great confidence can be placed in the test 

In order to apply Theorem XI an estimate of p is usually required. 
For this purpose 


(26) 


Xi + ^2 
ni -|- ^2 


is usually taken as the best estimate of p which is available from 
the samples. It is easy to show that E{p) = p. 
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Problems 


Suppose a variable w is normally distributed and a value is selected at ran- 
dom, Show that the odds are about 369 to 1 against the value differing 
from E(w) by more than 3 aifs. 

(a) Consider a finite universe of 5 variates: Xi, x^, Xs, Xi, xs. The number 
of distinct samples of 3 variates each that may be drawn is C(5, 3) = 10. 
Write these down. 

(b) Let Xi represent the ith sample mean and write down the 10 distinct 
sample means. . For example, 


Xy -|- “h 



(c) Show that the mean of the 10 values of Xi is the mean of the 6 values of 
Xi. Thus, 



What formula does this example illustrate? 

Show that the expected value of is greater than the square of the expected 
value of w. 

From a box: containing 2000 discs representing the distribution of span, 
draw a sample of 25 and compute its mean and standard deviation. Test 
the significance between your mean and the mean of the universe x = 
69.943 inches. 

Suppose the w’eights of a sample of 1000 men of the same age are obtained 
yielding x = 140 lbs. Assuming that (Xx ~ 20.0 lbs., what is the standard 
error of the mean of this sample? What is the probability that this mean 
does not differ from the mean of the universe at this age by more than 
five pounds? 

(Camp^^) The mean age of death of men who are alive at age 20 is, in the 
United States, 59.13. For the city of Chicago it is 58.98, and in 1910 the 
male population of age 20 was 24,000. Can the difference between the 
United States and Chicago be explained on the hypothesis of chance? 
Assume <r* = 10 years, and that the distribution of the universe is ap- 
proximately normal. 

{Camp^^) A fraternal organization wishes to be very sure that the average 
age of death in its group of men now aged 20 will not differ from the ex- 
pected 59.13 years by more than one year. By “ very sure ” it means that 
Qg must equal .999 or more. How large should the group be? (Assume 
as before that <tz ~ 10.) 
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8. Given that 

k 

w = £(/* 4- Xi). 

1 

k 

If the a;’s are independent and is a constant, show that 

1 

Tc 

1 

where vi® represents the variance of Xi. 

9. Find the mean value of all positive ordinates of the first quadrant of 

-i- 

(а) when equally spaced along the a-axis, 

(б) when equally spaced along the circle. 

Answers: 



10. Find the mean value of all the ordinates of the curve y — a + 6=^ from 0 to x, 
when equally spaced along the a;-axis. 

n. Derive (25). Hint. a, = £(0=f^- 

12. Show that the moment relations in (19) reduce to the corresponding rela- 

tions in (12) and (14) if ilf -> <» . 

13. Suppose 300 mice having cancer of about the same degree of malignancy 

were divided at random into two groups of ni = 100 and nz ~ 200, re- 
spectively. The first group was given a certain serum treatment which 
was withheld from the second group but otherwise the two groups were 
treated alike. Among the serum treated there were Xi — S deaths, and 
among the other group there were xz = 25 deaths. Test the significance 
of the difference between the mortality of 8% and 12|% in the two groups. 

14. An instructor had two classes of 20 and 30 students in the same subject. 

Four in the smaller class and 8 in the larger made grades of or better. 
Should one seek a further explanation of this difference beyond variation 
due to sampling? 



CHAPTER VII 

SMALL OR EXACT SAMPLING THEORY 

1. Introduction. A theory of sampling which assumes that N is 
large is inadequate for many practical problems. In recent years 
a theory has been developed to give more exact methods in dealing 
with small samples. In the practical field, the call for the solution 
of problems based on comparatively few observations was first 
realized in 1908 by a young man, then unknown, who chose to 
publish his results under the now celebrated pseudonym of Student.^’ 
Since then, many important contributions have been made toward 
the development and extension of this theory. Its applications are 
widespread. In the opinion of the present writer, continuity between 
large and small sample theory is an essential part of the newer atti- 
tude. In general, the methods of the theory of small sample theory 
are applicable to large samples, although the reverse is not true. 
It is our purpose in this chapter to facilitate an appreciation of 
some of the simpler aspects of this theory. The treatment centers 
around significance tests for meanSi variances, and correlation co- 
efficients. 

2. Expected Value of s\ By definition, the variance of a sample 
is given by 

(1) = r; . 


Then the expected value of from repeated samples is 


Eis^) = E 


■^(Xi^ + X2^ + 


+ - E{x^). 


Since the constitute a sample ’we may write 

E{xi^ + 0 : 2 " + • * • + XN^y - NE{x^), 
and from (16) of Chapter VI, replacing yhyx there, we have 
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Therefore, 


E{s^) = I {NEix^)] - ^ {E{x^) + {N- l)x^} 


N 


N 


{Eix^) - x^}. 


Hence 

( 2 ) 


E(s^) = 


N- 


N 


-or2 


where cr^ is the variance of x. 

We may also obtain (2) as follows: Consider independent samples 
each containing N variates t/ 2 , • • • , where Ui = Xi — x. For 
any sample, 

12 


1 ^ 


1 ^ 

w?“‘ 


V N 2 ^ 

"" N ~ Ip ~ Ip 

since the square of a sum is equal to the sum of the squares plus twice 
the cross-products. Then 

w -jiEituiuA. 

By Proposition III of Chapter VI the right-hand member of the 
above expression may be written 

which becomes 

Since EiuiUj) = 0, by Proposition V, we have the final result 


E{s^) = 


N 


N 


This result is sometimes stated as in the following theorem. 
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Theorem I. The mean of the sampling distribution of s^ from an 
arbitrary universe equals the variance of the universe multiplied by the 
factor'*' {N — 1)/-^^. 

It is to be anticipated that the expected value of s^ is less than 
as the following analysis will show. The variance refers to devia- 
tions from X, whereas any s^ refers to deviations from an x. For any 
sample, then, we may regard x as an arbitrary origin. Since in the 
case of any sample, the sum of the squares of deviations from its 
mean, x, is less than the sum of the squares of deviations of the same 
variates from an arbitrary point x (unless the sample is one whose 
mean falls at x), it is to be expected that the mean of all the values 
of s^ ydll be less than Relation (2) measures the extent of this 
inequality. 

3. Unbiased Estimates of Population Parameters. A distribution 
function is not only a function of the variable involved, but it is also 
a function of the parameters, or hypothetical quantities, which are 
introduced to specify the universe sampled. In the case of a 
Bernoulli distribution the parameter is p, in the Poisson law 
it is m, and in a normal distribution there are two parameters, 
X and <r. 

A function of the variates given by a sample for estimating a 
parameter is called a statistic. Let I be a statistic corresponding to 
a parameter $ in the universe. We now state the following 

Definition. If the expected value of dj E{d), equals 6 then 6 is 
called an unbiased estimate of 6. 

It is clear from Theorem I of Chapter VI that the mean of a sample 
is an unbiased estimate of the mean of the universe. Also from (26) 
of Chapter VI we see that p defined there is an unbiased estimate 
of p. 

Before the relation — (tJ^/N can be of much use to us in the 
applications we must have an estimate of from the sample or 
samples available. By Proposition I of the preceding chapter, 


E 




N 

N - 1 


E{s^) 


= or2 by (2). 


* This factor is sometimes called “ Bessels correction.” Perhaps it should 
be attributed more appropriately to Gauss who made use of it, in this connec- 
tion, as early as 1823. 
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Let 0 -® be an unbiased estimate of o-*. If this estimate^ is based on 
a single sample we have 


(3) 


¥ = 


N 


N 


■s2 = 


N 

'^(Xi - ly 
N-l 


If ~ 1 it is obvious that 


(3a) 


= 


n + 1 


It is conventional to take 


(4) 



as an estimate of a. If N is large the difference between unity and 
the coefficient of s in (4) is negligible in numerical problems. With 
N large it would not be invalid, to any appreciable extent, to use s 
as an estimate of a. 

If two independent samples are available from the same universe, 
an imbiased estimate based on the two samples is given by 


( 6 ) 

where 


g 

N-2* 


g = NiSi^ + Nisy, N = Ni + 


Si® and S 2 ^ being the variances of samples consisting of Ni and N 2 
variates, respectively. It is left as an exercise for the student to 
verify that the expected value of q/(N — 2) is a®. 

In case k independent samples are available from the same universe, 
we may generalize (5) and write 


( 6 ) 


O'® = 


Q 

V~k' 


where 

Q = NiSi^ + + • • • + Ntsy, 

U= Ni + N,+ ■.■ + N,, 


and Sj* is the variance in the tth sample consisting of Ni variates. 
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When O'® is used in future discussions it will be clear from the context 
whether this estimate is based on 1, 2, or k samples. 

If iVi = iV is ihe same for every sample, (6) reduces to 


( 7 ) 


NjSi^ + 52 =^ + + • • • + 

U -k 


where U = Nk. Clearly, (7) may be written in the form 

(7a) + S3=* + • • • + 

iV . fC 


When k is taken infinitely large so that U becomes the universe, the 
right member of (7a) then refers to the expected value of and 
becomes itself. So as A; — » oo the limiting value of (7a) becomes 

iV - 1 


as given in (2). 

As an alternate to (7), in the case where all samples contain the 
same number of variates, we may take 


(8) ^ = h{N) ^ ^ 

= X mean value of standard deviations, 
b{N) 


where b{N) is a function of N and approaches unity as N increases. 
The exact expression for b{N) will be derived in § 7. Its approxi- 
mate value is b(N) = 1 — 3/(4iV). As > oo the limiting value of 
(8) becomes 


( 9 ) 


m 

b{N)' 


In § 7 we will prove that b{N)cr is the mean of the sampling distri- 
bution of s from a normal universe whose standard deviation is cr. 
Values of b{N) and its reciprocal have been tabulated by E. S. 
Pearson® and others,^ and we have included a short table in § 7. 

As an alternate to (4) we have from (8) when = 1, 


s 

W)^ 


( 10 ) 
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4. Degrees of Freedom. In § 2 we have proved, essentially, that 
the expected value of J^ixi - x)^ is {N - l)(r% where the N values 
of X in the sample are subject to the linear restriction J^Xi = Nx. 
This is equivalent to proving that the expected value of is 

(N — l)a-2 when the x’s are subject to the linear restriction = 0. 
Suppose, however, that there a.Tek < N linear restrictions on the x’s. 
What, then, is the expected value of A. T. Craig ^ has proved 

analytically that if Xi, X 2 , • • • , xn, are N independent values of a 
variable which is normally distributed about zero with variance 
and if the N values of x are subject to k < N homogeneous linear 
restrictions, then the expected value of is (N — k)(r^. The num- 
heT n = N — k is frequently called the number of degrees of freedom. 

6. Student’s ” Distribution. The formula used in testing a null 
hypothesis that a given sample comes from a universe with a pro- 
posed mean is 

, . g - . 

(T 


As stated in Chapter VI, (11) is normally distributed if the universe 
is normal. On the side of applications, cr is seldom available and 
usually must be estimated from the data available. If we substitute 
into (11) the estimate of a given in (4) and calculate 


( 12 ) 


(x - 5 t){N- 1 ) 1/2 

, 

s 


we are not justified in asserting that (12) is normally distributed 
unless N is large. And so, in testing the significance of the mean 
of a small sample we are not justified in referring (12) to a normal 
probability scale. The variability of s from sample to sample 
invalidates that procedure. 

While Helmert obtained the distribution of as early as 1876 
it seems that “ Student was the first to recognize the importance, 
for the theory of small samples, of taking account of the variability 
of s in (12). By means of a remarkable intuition he obtained, some- 
what empirically, the joint distribution function of x and s from 
a normal universe. Later writers, notably Fisher, established his 
results rigorously. 

“ Student ’’ actually found the distribution of a slightly different 
variable, viz., 


s 
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Obviously, z is functionally related to t by 

(14) z=^t{N - 

so the distribution of t can easily be obtained from that of z. In 
deriving the distribution of z we shall follow the proof given by 
FisherJ To avoid interrupting the main development some of the 
details will be deferred to the next section. 

Consider a normal universe with frequency element 

df = dx. 

Let a sample (oJi, rr2, • • • , xn) be taken at random from it. Then the 
probability that the sample will lie in the element of volume 

dv — dxi dx 2 • • • dxiv 
is 

(15) dF = (27r<T2)-^/2e-^/2-^ dv, 

where Fmm the relation *^^2 = ^2 — we have 

72 == ^52 _|_ _ ^)2^ 

Hence (15) may be written 

(16) dF = (27r(r2)-^/2g-{W.2+Ar(3-5)2j/2<r2 d^. 

By means of iV-dimensional geometry (to be explained in § 6) Fisher 
showed that the element of volume dv can be expressed in terms of 
the variation of x, namely, dx, and the variation in volume, 
of an (N — l)-dimensional hypersphere of radius so that 

(17) dv — dx ds, 

where C is a constant. Then (16) becomes 

(18) dF = ds dx. 

From (18) the distribution of z can be deduced. From (13) we 
obtain dx = s dz for a fixed value of s. Substituting in (18) we 
obtain, for the joint distribution of s and z, 

(19) ds dz. 

This expression is defined for s ^ 0, being identically zero for s < 0 
since s is taken as the positive square root of s^. If s is integrated 


*Cf. Part I, Ch. IV, § 9. 
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out of (19), the distribution of the single variable z is obtained. To 
perform this integration, let 

2 / = s(l + ds = (1 + dy, 

and integrating with respect to y from 0 to , we have 

^ { X ” <^ 2 / 1 (1 + dz 

which reduces to 

K{1 + 22)-A^/2 dz 


where, as shown in § 4 of Chapter II, 



Therefore, the distribution function for “ Student^s z is 



The curve is symmetrical with mean zero and inj&nite range. It is 
quite different, however, in mathematical character from the normal 
curve although it approaches this form as iV’“~>oo. (Cf. §9.) 
From the viewpoint of sampling theory the important property of 
(21) is its independence of o*. The revolutionary character of this 
property is revealed in certain applications that involve drawing 
probable inferences from small samples, say from a sample of iV = 10. 

Using (14) Fisher modified (21) and obtained the distribution 
of t which is the one now widely applied. Before discussing the 
f-distribution, we shall give the details of Fisher’s derivation of (21) 
and consider the separate distributions of x, and s. 

6, Fisher’s Derivation. Making use of the geometrical method 
employed by Fisher^ we shall imagine an iV-dimensional space in 
which we take the origin at the point 0(x, 5, • • • , ^) and rectangular 
axes Ouif Ou%j • * • , Oun* A point can be located in a space of a 
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specified number of dimensions by associating with the point a set 
of numbers. Therefore, we may represent the sample by the point 
P{uif U 2 f • • • , un) where Ui = ~ x. Although it is impossible to 

visualize a space of N dimensions for A > 3 we will carry through the 
argument for the general case by analogy with the case for A' = 3. 
So we consider the latter case first. 

When A = 3, the sample is represented by the point P(uij Uz^ U 3 ) 
and we have the mean u and variance defined by 

(a) Ui Uz Uz Zu 


and 

( 6 ) {ui — uy + {uz — uy + (uz — uy = 



For an assigned Uj (a) represents a plane; and, for an assigned pair 
of values of (u, s), (6) represents a sphere with center at the point 
M(u, Uf u). The line 

(c) Ui — Uz - Uz 

has direction cosines each equal to 1/(3)^^^ and is normal to the 
plane (a). The perpendicular distance of P from this line is 

ikfP - s(3)'/2 

as can be seen from (h). We require the probability, to within 
mfinitesimals of order higher than du ds, of getting a sample of 
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JV = 3 independent values of u which will simultaneously jdeld 
values of u and s which lie within the region bounded by -u, w + du 
and s, s + ds. Following the method of § 5, an element of this 
probability density is given by the expressions 
dF = (27r(r2)-3/2e-<«i'-f-“22+«32)/2(r2 

= (27r<r2)“"3/2e-302+52)/2a2 

where 

dv = dui du 2 duz. 

As the sample point P{ui, U 2 , uz) varies, u and $ also vary. Cor- 
responding to different values of s there are a set of concentric spheres 
defined by (6) all having the same center. Since the plane (a) 
passes through the common center of the spheres, the region dv is 
a shell between concentric spheres of radii VSs and V3(s + ds). 
To use a homely illustration, dv corresponds to one of the successive 
layers in an onion. Our problem is to express dv in terms of u, s, 
du, and ds. Now the line (c) meets the plane (a) at M and the 

distance OM is 

OM = 

so we have the differential element 
diOM) = (3)1/2 du. 

Since the plane (a) passes through M, the intersection of the plane 
and sphere is a great circle with center at M and radius equal to 
5 ( 3 )i/ 2 . Xhe area of this circle is 

A - 3ts2 

and the differential element dA is 

dA = Gtts ds. 

Therefore, within infinitesimals of higher order, 

dv = dA d{OM) 

= CiS ds du 

where here and hereafter, in this section, the Cs are constants. So, 
the required probability is 

dF = ds du. 

Passing now to the general case involving N-space, let P be the 
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point representing the sample {ui, • • • , un)^ Then PM is the 
perpendicular from P upon the line 

(d) = 1^2 ==•*•= WiV 

and we have 

OM = (Nyf% OP^ = 

MP^ = OP^ - OM^ = ~ = Ns\ 

In A^-space, the plane (a) generalizes into the hyperplane 

(e) Y^Ui = Nu, 

and the sphere (&) generalizes into the hypersphere 

(/) E(^i ~ 


with radius MP == (Ny^^s and center at (w, u, • • * , u). Now, the 
hyperplane (e) will intersect the hypersphere (/) in an {N — 1)- 
dimensional hypersphere to correspond to the circle for the case 
N — 3. Consequently, for a given pair of values of u and s, the 
point P will lie on an {N — l)-dimensional hypersphere orthogonal to 
the line OM, The volume of this {N — l)-hypersphere is given by 


and so 


A = CiiVNs)^-^ 


dA = ds. 


Therefore, the volume dv = dui du^ * • • dun between two concentric 
spheres of radius VNs and V N {8 ds) is approximately 

dv = dA d{OM) 

~ Cs^~^ ds du. 


Since dui = dxi and du = dx, (17) is established. 

7. Distributions of x, s-, and s, Taken Singly, It is clear that 
(18) may be written as follows: 


( 22 ) 






From this factored form it follows that 

(a) The law of distribution, G(x), of sample means from a normal 
universe is given by 

(23) 


G(x) = 
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it being fairly obvious from the form of G(x) that 


C-f)’ 


■ 1/2 


(24) h 

It may also be evaluated by imposing the condition that 

G{x) dx = 1. 


/_■ 


Evidently, G(x) is a normal distribution with mean equal to x and 
standard deviation equal to a/{NY'^, a result already familiar from 
Chapter VI. 

(5) The variance; of a sample is distributed according to 

(26) H{S^) = (s2)(V-3)/2^ 

where (see § 4, Chapter II) 


(26) 


= 


(■ 


JV. \ 




Thus the distribution of the variance was found by first finding the 
simultaneous distribution of the variance and mean. Clearly, 
H{s^) is a Pearson Type III curve with range limited at one end, 
§2 = 0 ^ liot at the other, = oo . 

(c) The vaftance, s^, and the mean, x, are distributed quite inde- 
pendently, that is, 

F{x, s2) = G(x)H{$^), 

It has recently been proved by Geary® that a necessary and sufficient 
condition that x and from samples of N values of x be independent 
in the probability sense is that the be normally distributed in the 
parent universe. 

In § 2, the mean of the sampling distribution of from an arbitrary 
universe was obtained. It is interesting to verify that result in the 
present case where we know the distribution function. The mean 
of the distribution of variances of samples of N from a normal 
universe is given by 


' Jo 
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where H(s^) is defined in (25). So we have 


E{s^) = hj (i(s2) 

/2<r2\(^+W2 /iv + l\ 

= *■(«) K—) 


N -1 


The standard deviation of the H(s^) distribution is, approximately, 


The distribution of the standard deviations of samples of N from 
a normal universe is readily found from (25) and (22) to be 

(27) Hs) - 

So its mean value is given by 


which yields the result 


j r» 05 

h{s)8 d$ 

0 

.-.(frHf). 


Upon substituting the value of h given in (26), the above expression 
becomes 

If we denote this coefficient of cr by b{N) we have the relation 

E{s) 


which was promised in § 3. Romanovsky® showed that 


6 (iV)-^(l 


iN B2m 
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Table 11 gives values of the reciprocal 
of b{N) for a few values of N. 

Romanovsky also deduced the 
standard deviation of the h(s) dis- 
tribution to be 

_^ / 1 2 3 _ y/s 

The approximate value 

/ 1 \l/2 

■ [w) ' 

is frequently used in practice and 
this is the basis for the common 
statement that the standard error of 
a standard deviation is 1 /( 2 )^''^ 
of a mean. 

The modal value of s, easily found* by differentiating h{s)y is 



Table 11 


N 

l/HN) 

2 

1.772 

3 

1.382 

4 

1.253 

5 

1.189 

6 

1.151 

7 

1.126 

8 

1.108 

9 

1.094 

10 

1.084 

20 

1.040 

30 

1.026 

50 

1.015 

100 

1.008 


If we make the substitution y = s — Sj then the distribution of y is, 
to a first approximation, the normal curve 

(31) Const. X 


with standard deviation 

8. The (X, 5)-Frequency Surface. We may regard F{XjS) as 
describing a frequency surface if the total volume under the surface 
represents the expected frequency of the means and standard devia- 
tions of all possible samples of size N. In depicting this surface it is 
convenient to let w x — x so that the origin of u is at ^ x. 

Since 


fx: 


F(x, s) dx ds = 1, 


then the volume under the surface over a closed contour in the 
ws-plane represents the proportion or percentage of samples whose 

* If we make h(s) a maximum for variation in a we find that 

0 - = mH/(N - 1 ) 1/2 or s = ^{(N - l)/iV}i/2 (cf. Riderio). 
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means and standard deviations fall simultaneously within the ranges 
defined by the boundary of the given contour. In an illuminating 
paper “ by Deming and Birge two such frequency surfaces are rep- 
resented. These are reproduced in Figure 21, one for a small value 
of N and the other for a comparatively large value of N. 




Fig. 21. The Surface ^(3, s) Illustrated by Sections 

As the authors point out, the highest point of the surface has the 
coordinates u = 0, 5 = - 2)/NY‘^. “ Because of the inde-' 

pendence of x and s, all plane sections s = constant will be normal 
curves with standard deviations equal to a/iNyi^. The u = con- 
stant sections will be skew curves whose equations are given by h{s). 
They will all have the same mean and mode. As N increases, their 
mean and mode approach coincidence with the value a while the 
curves lose their skewness and become normal with center at s = cr 
and standard deviations equal to or/(2Ny^^, As N increases, the 
surface becomes more and more concentrated about the point 
il = 0, s = (7. 

9. Fisher’s ^-Distribution. Substituting (14) into (21) and re- 
placing iV — 1 by n we obtain 

( /2\-(n+l)/2 

1 + M 

where 1/Kn = n^^2B(7z/2, 1/2), B being the Beta function. 

Inasmuch as (32) is independent of (t, it can be used in situations 
in which the value of cr is unknown. The quantity t involves no 
hypothetical quantities, being completely expressible in terms of the 
variates. 
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In 1925, “ Student ” published in Metron^^ an extensive table of 
the probability integral f Fn{t) dt. More recently, Fisher has 

%/ — CO 

given a short table of the probability P of occurrence of deviations 
outside db^, for values of t and n commonly met in applications of 
small sample theory. Let 


P^(t) = 2 rVn(0 dt. 
Jo 


Then the probability P tabulated by Fisher is 
(33) P = 1 - Pn(t). 

Fisher^s table gives successive columns showing for each value of n, 
from n = 1 to n = 30, the values of t for which P takes the values 
given at the head of the columns. A general idea of the table may be 
obtained from the portion which we have reproduced* in Table 12. 


Table 12. Values of f from Table IV of Fisher’s Text 



.9 

.7 

.5 

.1 

.05 

.01 

3 

.137 

.424 

.765 

2.353 

3.182 

5.841 

4 

.134 

.414 

.741 

2.132 

2.776 

4.604 

5 

.132 

.408 

.727 

2.015 

2.571 

4.032 

6 

.131 

.404 

.718 

1.943 

2.447 

3.707 

8 

.130 

.399 

.711 

1.860 

2.306 

3.355 

10 

.129 

.397 

.706 

1.812 

2.228 

3.169 

15 

.128 

.393 

.691 

1.753 

2.131 

2.947 

20 

.127 

.391 

.687 

1.725 

2.086 

2.845 

30 

.127 

.389 

.683 

1.697 

2.042 

2.750 

00 

. 1257 

.3853 

.6745 

1.6449 

1.9600 

2,5758 


The number n, with which to enter the table, is determined by the 
number of degrees of freedom involved in the available estimate (§ 3) 
of In testing null hypotheses the rule given in § 12 of Chapter VI 
may be used, where, of course, Ps is to be replaced now by P. 

The distribution of t (as well as that of z) approaches the normal 
type as n oo . This may be established as foUows. Using Stirling’s 
approximation on the coefficient Kn in (32), we obtain, after some 

* With Fisher’s permission and that of his publishers, Oliver and Boyd. 
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algebraic simplification, the following expression : 


From this it is easy to show that 

lim K„ = 

^ CD 

The rest of the t function may be written as 




•«/2 


which, when n = co , becomes Therefore, 

lim Fn(t) - (2'Tr)-i/%-^2/2. 


The entries in the last line of Fisher^s table, corresponding to n = oo, 
are the deviations from the mean of a normal curve with unit standard 
deviation. 

According to Student,’^ the distribution of z tends to approach 
a normal curve with a standard deviation of {N — for large 
values of N, Doming and Birge (loc, cit) have suggested that the 
distribution tends to approach normality with {N — as 

standard deviation. Anyhow, for large values of N, {N — 
would be approximately normally distributed about zero with unit 
standard deviation. Since 


it is frequently satisfactory in applications to refer 


(34) 


(x-x)(N-3yf^ 

s 


to a normal probability scale when N > 30. 

For large values of AT, (34) represents so small a refinement over 
(22) of Chapter VI that the additional computation seems unwar- 
ranted. So when N considerably exceeds 30 the older procedure of 
replacing a by s and treating ^ = (x — x){Ny^^/s as though it were 
normally distributed with unit standard deviation is not appreciably 
erroneous. 
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10, Difference Between Two Means. Fisher^ demonstrated that 
(32) has a much wider range of application than the problem for 
which it was designed. He showed that the iJ-distribution is appli- 
cable whenever we are dealing with a normally distributed variate 
whose standard deviation is not known exactly but is independently 
estimated from observations amounting to n degrees of freedom. 
The scheme by which the Student idea is made available to other 
problems consists in constructing a variable t in the nature of a frac- 
tion whose numerator is any statistic normally distributed and whose 
denominator is the square root of an independently distributed and 
unbiased estimate of the variance of the numerator involving n 
degrees of freedom. Thus the ^-distribution has been found useful 
in such problems as testing the significance of the difference be- 
tween two means and testing hypotheses regarding regression co- 
efficients. 

Let xi, X 2 be the means and §2 the standard deviations of two 
independent samples of Ni and N 2 variates, respectively, from a nor- 
mal universe with mean x and variance According to (10) of 
Chapter VI the variance of the difference between the two means is 
0-2 {Ni + N 2 ) /ViiV' 2 , Then it can be proved that the variable 



is normally distributed with. unit standard deviation. However, in 
most practical problems <r is unavailable and must be estimated from 
the samples. Using the unbiased estimate defined in (5), the above 
formula becomes 


(36) 


Xi-X 2 \ N 1 N 2 

& U1 + A2J 


Fisher showed that (36) is distributed in accord with (32) for 
n = Ni + N 2 — 2, and we can find from Fisher^s table of P the prob- 
ability of a greater difference between the means than that observed. 

As Ni and N 2 become large, {Ni + N 2 )/{Ni + N 2 — 2) tends 
toward unity and (36) tends toward the value 
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Since (36) is asymptotically normally distributed, the older procedure 
of referring (37) to a normal probability scale in testing a null hy- 
pothesis that two samples are from the same universe would not be 
invalid to any appreciable extent for large values of Ni and N^. 
The present writer has recently called attention to an erroneous 
formula which is commonly used in place of (37). 

If one of the samples, say N^, is so much larger than the other that 
it tends toward the universe, then Z 2 tends toward x and tends 
toward cr. So, under these conditions, (37) tends toward 

t = (^1 ~ x)VNi 
< r 

which, if the subscripts are dropped, is the formula used in testing a 
null hypothesis that a given sample comes from a normal universe 
with a proposed mean. When Ni = N 2 = N, (36) reduces to 

( 38 ) . 

Inasmuch as we do not ordinarily know whether a sample is drawn 
from a normal universe or some other type of universe, a question 
quite naturally arises as to whether the procedure inaugurated by 
“ Student and extended by Fisher is applicable to small samples 
from non-normal universes. The question may be considered par- 
tially answered by Bartlett^® and others who have shown that it 
gives a good approximation for considerable departures from nor- 
mality in the sampled universe. However, a word of caution seems 
to be in order lest the new procedure be oversold in the applications 
by completely neglecting the underlying assumptions of normality 
in the universe and randomness of the samples. 

The following examples, cited by Rietz/^ illustrate the Student ” 
theory. 

Example 1. Suppose a random sample of iV = 5 is obtained from a hypo- 
thetical normal universe whose mean is x = 2. It is found that 5 = 3 and 

= I for the sample. What is the probability that one would obtain a sample 
of five for which x would differ numerically from x by as much as unity? 

Solution. From (12), t = VS = 2.236. Entering Fisher's table for = 4, 
we find the probability P between .1 and .05. Reference to the more extensive 
table in Metron^^ gives P = .0892 for the probability of a discrepancy as large 
as the one observed. It is interesting to compare this result with what would 
be obtained by reference to a normal probability scale. We find P — .0254 for 
a deviation outside t = =fc2.236. In terms of the odds that a mean, 5, wiU 
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deviate numerically more than 1 from theory, the contrast is more striking. 
Thus, under the Student theory we should say that the odds are 10,000 to 
892 or roughly 11 to 1 against a deviation as large as or larger than the one 
observed. Under the normal theory the odds are 10,000 to 254, or about 40 to 
1 against such a deviation. 

Example 2. The following data represents the yields in bushels of Indian 
corn on ten subdivisions of equal areas of two agricultural plots in which Plot 1 
was a control plot treated the same as Plot 2, except for the amount of phos- 
phorus applied as a fertilizer. 


Phil 

Plot 2 

6.2 

5.6 

5.7 

5.9 

6.5 

5.6 

6.0 

5.7 

6.3 

5.8 

5.8 

5.7 

5.7 

6.0 

6.0 

5.5 

6.0 

5.7 

5.8 

5.5 

10 |60.0 

10 |57.0 

xi = 6.0 

^2 = 5.7 


Is there a significant difference between the yields on the two plots, using the 
difference between their means as a criterion of judgment? 


Solvtwn. 


52 ® 


:64 

10 

10 


= .064 


= .024. 


Substitution in (38) gives 

f 9 

= (.3) (10.113) = 3.034. 

Entering “ Student’s ” tables in Metron (loc. cit.) at n == 18, we find P = .0072 
for the probability that t will faU outside the range —3.034 and +3.034. Hence 
a null hypothesis that the samples are from the same universe would be refuted 
by the test for both the .05 and .01 levels of significance. In other words, our 
conclusion is that, on the levels of significance adopted, there is a significant 
difference between the yields on the plots. 

11. Fisher’s z-Distribution. Suppose and are two' inde- 

pendent and unbiased estimates of the variance of a variable x 
which is normally distributed. If these estimates are based upon 
samples of Ni and respectively, or upon ni and degrees of 
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freedom, then we have 

Ni i wi+1 

= Yt 7 = ~ Z} (^1< - 

iVi — 1 1 ^11 

1 iVa I na+l 

A? T Z^(^2i - X2y = — 22 - Jz)® 

in which Si and S 2 are the means of the two samples. In previous 
notation and would be denoted by ai^ and 0 ^ 2 ^ but these symbols 
are too unwieldy in the present discussion. 

In constructing a test of significance for the difference between two 
sample variances it might seem logical to form the difference 
w = and seek the distribution function of w. However, 

such a procedure is impractical because of the mathematical difficulty 
involved in determining this function. Fisher circumvented this 
difficulty by building a statistic, z, defined by 

(39) 2 : = |(loge - loge v^) = loge^ 

V 


whose distribution function, G{z), he obtained and which proved to 
have extremely wide application. To derive G{z) we make use of the 
dis tributi on of H(s^) given in (25), replacing — 1 by n and s“ by 
(n/n + 1) (see § 3). After this modification, (25) becomes 


(40) 





Since and are independent their joint distribution is 

(41) K (tt2) (>.l-2)/2(j,2) (.n^2V2e-U,u^+n2^)l!la^ rf(y2) ^(p2) 


where 


K = 


(ni)”i'®(n2)"2'* 


2 (ni+n2)/2(y (wi+n2)p 


■(f) 


From (39) we have 

(42) 


and for a fixed value of 

(43) d{u^) = dz. 
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Using (42) and (43) in (41) we obtain 

( 44 ) d{v^) dz 

for the joint distribution of and z. Integrating (44) with respect 
to between the Umits 0 and «> and making use of the Gamma 
function we obtain the distribution of z, 


(46) 


G(z) = 


(jii n-\ (nie^ + n2)'”i+’'2)/2 


The function G{z) has the important property that it depends 
solely upon ni and ^ 2 , not at all upon the variance of the sampled 
universe. Fisher^s z should not be confused with the ;s-distribution of 

Student,^^ 

The distribution function of z is extremely general, including as 
special cases, the x^-distribution, the if-distribution of Student and 
Fisher, and the normal distribution. Rider has made easily avail- 
able the transformations and substitutions by which these special 
cases can be obtained from (45). 

The positive part of the curve for z = logs {u/v) is the same as the 
negative part for z = loge {v/u). Since it is optional which estimate is 
considered as it is necessary, in tabulating the probability integral 
of G{z)y to consider only positive values of z by making the larger 
variance estimate (based on Ui degrees of freedom). 

/ zq 

G{z) dz and let P = 1 — Q. Thus P is the probability 

■ 00 

that z > 2 !o. In his book, Fisher has given values of Zq corresponding 
to the probabilities P = .05 and .01 for various combinations of 
ni and These values, z^, are called the 5% and 1% points ” 
and are used as critical values in judging significance. It should be 
noticed that Fisher^s “ points are based on the area of the whole 
curve and therefore they should not be confused with 5% and 1% 
levels of significance '' previously used. In the latter sense, 
Fisher’s “ points ” would be 10% and 2% “ levels of significance.” 
In other words, a 5% point means a value of z such that one “ tail ” 
under the curve is .05, whereas a 5% level of significance meant a 
value oil such that the sum of both tails ” (outside dzt) is .05. 
It is hoped that tables of 5% and 1% levels of significance for z will 
sometime be available. 


145 


Small or Exact Sampling Theory 


12. Significance of Difference Between Variances. The usual 
h 5 i)othesis tested by the 2 ;-test is that and are estimates of one 
and the same population variance and therefore that z — 0. The 
significance of the divergence of the observed yalue of z from zero 
is the crux of the test. Small values of z mean a tenable h3rpothesis 
whereas values of z larger than Zo refute the hypothesis. If for 
p = .05 (or .01) the observed value of z, as computed from the 
samples in accordance with (39), is larger than z^^, the hypothesis 
is to be rejected and the conclusion is that the samples come from 
universes with different variances. 

Logically, the 2 :-test should be applied before testing the difference 
between two means since the latter test depends on the equality of 
the population variances. 

To avoid the troublesome logarithmic computation involved in 
(39) Snedecor^o has published tables which transform Fisher^s 5% 
and 1% points into the ratio where Snedecor 

calls this ratio F in honor of Fisher.* Therefore, 



where is to be chosen the larger of the two given variance estimates. 
This table is reproduced in the Appendix. (See Table II.) 


Example 3. In Example 2 suppose we wish to test the assumption, which 
was made there, that the two samples come from universes with equal variance. 
We have 






rei + 1 

5i- 

ni 

712 -h I „ 

§2^ 

722 



= .0711 


- .0267 


„ .0711 

F = 

.0267 


- 2.663 


s = .5 logeF 
- 1.1513 logioF = .49. 

Entering Fisher’s table (loc. cit,) for tii = W 2 = 9 we find Zq = ,58 for P = .05 
and zq - .84 for P ~ .01. This means that, if the true value of z were zero, 
random sampling fluctuations would be expected to give a value of z as great as 
»84, or greater, once in 100 trials, and a value of z as great as .58, or greater, five 

* In their new Statistical Tables Fisher and Yates call it the variance ratio. 
These tables are published by Oliver and Boyd, London. 
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times in 100 trials. The observed value of z is .49 and so this value might be 
accounted for by chance, at either the .05 or .01 points of significance. Using 
Snedecor’s table we find F = 3.18 for P = .05 and F = 5.35 for P = .01. Since 
the observed value of F is only 2.663, we conclude that we were justified in 
proceeding with the t-test. 

When the samples are large there are two procedures available. 

I. G{z) is skew when rii 5^ 712 but when Ui = 112 it is symmetrical. 
When ni and 712 are large and also for moderate values when they 
are equal or nearly equal one can verify (by taking logarithms) that 
z is approximately normally distributed about zero with mean zero 
and variance |(l/ni + I/712). Therefore, 


(46) 



may be referred to a normal probability scale. 

II. Let w = Si — S2. From ( 10 ) of Chapter VI and ( 29 ) of this 
chapter, 


Then 

(47) 



+ 


1 

2N2) 



Si — Si 



1 

2N2) 


is normally distributed about zero with unit standard deviation. 
An estimate of the supposed common variance is given in ( 5 ). 
Using the square root of this estimate in place of cr in (47) and assum- 
ing that Ni and N2 are large enough to regard, without appreciable 
error, the ratio {Ni + N2)/{Ni + V2 — 2) as unity we obtain 

(47a) t = 7 — — - 

l2iV2'^2iVj 

This value may then be referred to a normal probability scale. 

An interesting derivation, using characteristic functions, of a 
method for testing the significance of the difference between two 
sample variances has recently been given by A. T. Craig.^^ 
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13. Analysis of Variance. The test of significance between two 
independent sample variances (with their appropriate degrees of 
freedom) is a special case of a general technique, developed by Fisher, 
for segregating the variance into portions traceable to specific sources. 
In general, the kind of procedure one attempts to follow in such an 
analysis can be illustrated by the following scheme. 

Let us imagine a individuals /i, J 2 , * • • , /«, each subjected to 
h treatments Ti, , Tj. For example, the J^s may be agri- 

cultural plots containing different varieties of some plant and the 
Ts may be applications of various kinds or amounts of fertilizers. 
Or the J's might conceivably be various diabetic patients and the 
Ts varietal insulin treatments. The effects of the Tb on the Ps 
yield a set of observations, to be denoted by Xjk, which vary from 
one value of I to another for a fiixed T and from one value of T to 
another for a fixed /. Suppose, then, that N — ah independently 
observed values of a normally distributed variable are classified into 
a rows and h columns in accordance with some relevant scheme as 
depicted in Table 13. 


Tabub 13. Matrix of N = ab Independent Values 
PROM A Normal Universe 



Ti 

T2 

. . . 

Ti 


Xiu 

Xl2, 

. . . 

Xib 

h 

Xili 

X 22 , 

. . . 

X2b 

la 

Xalj 

’ Xa2) 


Xab 


The values in each row will vary about the mean of that row and 
the values in each column will vary about the mean of that column. 
Let 5/. denote the mean of the jth row, 

b 

(48) bxj\ = J ~ Ij 2, * • • , (Z, 

and let denote the mean of the fcth column, 

a 

(49) ax,je ^ — 1, 2, • • • , 6. 

y*! 

(The dot indicates that summation has been effected on the index 
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which it replaces.) Let the mean of the entire set be x where 

CL h 

(50) abx = 

Let the variance in the entire set due to all causes be Q/ ah where 

a h db 

(51) Q = 

1 1 1 

Now Q can be resolved into three quadratic forms as follows: 

(52) Q == + ^2 + ^3 

where 

gi = 

1 

h 

q2 = a^ix.jc — 

1 

a b 

Qa = ~ ~ + xY- 

1 1 

That (52) is an identity in the N — ah values of x can be readily seen 
as follows: 

a h d h 

- Xj. - x.k + 2) + {xj. — x)+ {x.k — x)Y 

11 11 

= 22(a:,-fc - Xj. - x.k + xY + 

1^1 1 1 

+ ~ 

1 1 

To show that the cross-product terms vanish consider the term 

dh 

- Xj, — X,k + x){Xj, — X). 

1 1 

This becomes 

a b 

'^{Xj. - x)'^(xjk - Xj. — x.k -Yx) 

jsal A»=l 

d 

= ^(xj, — ^)(bxj. — bxj, — + hx) = 0. 

i = 1 

A similar demonstration can be made for the other cross-product 
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terms. This is left as an exercise for the student. Since 

- xy 

1 1 1 

- x)^ = aJ^ix.jc - xy 
1 1 1 

(52) is established. 

The variability between rows is measured by qi and between columns 
by g 2 . The residual variability, freed from the influence of either 
rows or columns, is measured by qz and is called interaction (sometimes 
also discrepance). It may be regarded as the experimental error ” 
inherent in the experiment and over which no control is attempted. 
As will be shown later, it is used as a standard against which the 
variability measured by either qi or q^ may be tested for significance, 
when the appropriate number of degrees of freedom are taken into 
account. 

From (51) the number of degrees of freedom in Q is seen to be 
N — 1, Since there are a values of Xj, the number of degrees of 
freedom in qi is (a — 1). Similarly, the number in is (6 — 1). 
This leaves (A — 1) — {(a — 1) + (6 — 1)} = (a — 1)(6 — 1) for 
qzj a result which may also be deduced from the expression for qz. 
Another form of argument is as follows. The ab means of rows and 
columns form an (a X 6)-fold table of (a — 1)(6 — 1) degrees of 
freedom since the marginal means are fixed in terms of the Xjk values. 
Anyhow, the number of degrees of freedom in interaction is the prod- 
uct of the numbers in the interacting forces. Accordingly, an un- 
biased estimate of from the rows is qi/{a — 1), from the columns 
is ^ 2/(6 — 1); and from interaction is qz/((^ — 1)(6 — 1). It is 
clear, therefore, that the 2 :-distribution can be employed to test the 
significance of the variability attributable to these sources if the 
independence of the above-mentioned estimates is assured. A. T. 
Craig has settled this point by establishing the independence of 
the g^s. 

The quantities required in an analysis of variance are summarized 
in Table 14. They can be readily computed except possibly qz. So 
long as the arithmetic involved in computing the other quantities 
is carefully checked it is sufficient to evaluate qz from relation (52). 
In other words, the sum of squares due to interaction may be found 
by subtracting (gi + q%) from the total sum of squares. 
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Table 14 


1 

Variance due to 

D. of F. 

Sum of Squares 

Unbiased 

Estimates 

Rows 

a — 1 

a 

qi = 

1 

3i/(a - 1) 

Columns 

h - 1 

b 

52 = aY,{x.k - 2)^ 

1 

52/(6 - 1) 

Interaction 

(a - 1)(6 - 1) 

qz - Q —qi — q 2 

qz/{a - 1)(6 - 1) 

Total 

— 1 

0 

1! 

5r 

ft- 

1 



Under the null hypothesis that there is no significant variation 
from row to row, the quantity 


(53) 


s = I log, 


(b - l)q^ 


will be distributed in accord with (45) and the hypothesis can be 
tested from critical values of z or, more conveniently, perhaps, from 
Snedecor’s table by computing 


(64) 


(b - l)gi 

93 


and entering the table at (wi, 712 ) where % = 6 — 1, and n 2 = 
(a — 1) {b — 1). If the computed value falls above the critical value 
adopted, the null hypothesis is rejected for that value. Similarly, 
to test the null h 3 T)Othesis that there are no significant effects from 
column to column we compute 


( 66 ) 


F = 


(a - l)g2 


98 


'1 


and compare it with one of the tabular entries for ni = o — 1, na = 
(a-l)(b-l). 


Exam'ph A. On a feeding experiment a farmer has four types of hogs denoted 
by I, II, III, IV. These types are each divided into three groups which are fed 
varietal rations A, jB, and C. The following results are obtained, the numbers 
in the table being the gains in weight in pounds in the various groups. 
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I 

II 

III 

IV 

Totals 

A 

7.0 

16.0 

10.5 

13.5 

47.0 

B 

14.0 

15.5 

15.0 

21.0 

65.5 

C 

8.5 

16.5 

9.5 

13.5 

48.0 

Totals 

29.5 

48.0 

35.0 

48.0 

160.5 


The computations yield the following results : 


Sum of Squares 

D. of F. 

Unbiased Estimates 

Rations 

54.1250 

2 

27.06 

Types 

87.7292 

3 

29.24 

Interaction 

28.2083 

6 

4.70 


To test the significance of the variation in rations we refer F - 27 . 06/4.70 = 5.76 
to Snedecor’s table where, corresponding to (2, 6) degrees of freedom, we find 
5.79 for the 5% point and 10.92 for the 1% point. Similarly, to test the sig- 
nificance of the variation between types we compute F == 29.24/4.70 = 6.2. The 
entries in the table for (3, 6) degrees of freedom are 4.76 for the 5% point 
and 9.78 for the 1% point. Our conclusion is that there is a significant differ- 
ence between breeds (somewhat doubtful) and between varieties of rations at 
the 5% point, but that neither is significant at the 1% point. 

14. Testing Variation in Sub-sets of Means. In a previous chap- 
ter a method was given for testing the significance of a difference 
between two means. We shall now show that the analysis of vari- 
ance technique lends itself to testing the significance of differences 
between any number of group means. 

Consider normal universes with means 5a:, = 1, 2, • • 6), and 

variance Let samples of Nx be drawn one from each of these 
imiverses and let 5* and Sx^ be the mean and variance of the sample 
of Nx. Thus we have h classes or arrays (as in a correlation table). 
The notation for the samples is summarized in Table 15. 
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Table 15 


Classes 

1 

2 

. . X 

• • b 

Means 

Vh 

^2, 

- Vx, 

• • Vb 

Standard 

Deviations 


S2, 

Sxj 

Sb 

Frequencies 

at,, 

iV2, 

• • AT., • 

• • Nb 


Our problem is to test, from the samples, the hypothesis that yi = §2 

= . . . = 

It can be shown (Cf. Part I; Ex. 3, p. 208) that the sum of the 

6 

squares of deviations of the N = ^Nx variates yx from the mean 

1 

y of the entire set may be broken up into two parts such that 


where 


V = + ^2 

V = j:{y.-yY 

. 1 

vi = '^Nxiyx — y)^ 
1 
b 

V2 ~ ^ / A ^ X ^ A 


It is conventional to call vi the variation between classes and the 
variation within classes. 

b 

An unbiased estimate of is y where Ny = XX* P*- Hence there 

1 

are 6 — 1 degrees of freedom in vi. An unbiased estimate of from 
the values of yx is Vi/ {b - 1), and from the values of Sx^ is v^/iN — b) 
since the variates in the computation of are subject to the linear 

restriction Xy* = NxVx and there are b values of x. Therefore, 

imder the null hypothesis that yi = h - ■ ■ = yb, the quantity 


z = 


|log« 


(N - b)vi 


(56) 


(b - 1)V2 
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is distributed in accord with (45) and the hypothesis can be tested 
by computing 


(67) 


(N - b)vt 
(b - 1)V2 


and comparing it with the entries in Snedecor^s table for (ni, 712 ) 
where rii = b — 1, 712 = N — b. The quantities required in the 
computations are summarized in Table 16. 


Table 16 


Variance due to 

D. of F. 

Sum of Squares 

Unbiased Estimates 

Between Classes 

6-1 

Vi 

vj{b - 1) 

Within Classes 

N -h 

V2'-' 

v/{N - 6) 

Total 

N -1 

V 

(iv - 6)t)i 

(6 — l)V2 


The variation within classes is independent of the principle of classifi- 
cation. Therefore, excessive variation between classes (variation of 
the yjs) as compared with variation within classes (variation of 
sample values about their respective means) will cause F to fall 
above the critical value adopted, and the null hypothesis is contra- 
dicted or refuted for that value. 

Examples from agricultural and certain branches of biological 
science will be found in the textbooks by Fisher and by Snedecor, 
and from the field of economics in Mills^ text (revised edition), 

15. Testing Linear Regression, Consider a correlation table with 
b arrays in the x direction. Let f{x) represent the frequency and 
the mean in the array at x. Let (x, y) be the mean of the table and 
mi and m 2 the linear regression coefficients as defined in Part I. 
Suppose the N = y) entries in the table constitute a sample 

X V 

from a normal bivariate universe and we wish to test the hypothesis, 
H, that the regression of j/ on a: is linear. It is shown in Part I that 
Yx — y = miix — x) is the equation of the line which fits the means 
of the arrays best, in a least-squares sense, and so Fa, is the estimated 
mean of the array at x. (A slightly different notation was used in 
Parti.) 
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The variation B between arrays can be resolved into two com- 
ponents Bi and B% such that 

(58) J5 == + S 2 

h 

where B = ~ yY 

1 

1 

B, = - yY^ 

1 

To establish (58) we may write B in the form 

F.) + (7.-»-'F)}2 

1 

which upon expansion equals Bi + B<i because, as the student may 
verify, the cross-product term vanishes. 

It is shown in Part I that B = Nr}yY(^y^ (Cf. (39), p. 200) and 
B 2 == ATrVy^ (Cf. (16), p. 172). Since B 2 is the part of B which is 
accounted for by H it follows from (58) that Bi = Nay^ (rjyx^ — r^) 
is the part of B not accounted for by H, We are interested in the 
question. Is Bi excessive compared with the random sampling fluctua- 
tions to be expected under the nuU hypothesis? To answer this 
question consider the variation W within arrays where 

W = T>m(.y - y^y- 

X 

In Part I this was designated by NS'y^ which in turn is equal 
to ~ yiyx^)- This variation within classes is due to a host 

of random forces which are not dependent on the value of a; de- 
fining the arrays. Therefore, W provides a basis for testing 
whether Bi is small enough to be accepted as the resultant of random 
forces under H or whether it is so large as to contradict H. Before 
we can use the 2 !-test, however, the degrees of freedom must be 
reckoned. In B there are 6 — 1 degrees of freedom because the 6 

values oi yx are subject to the linear restriction ^yxf{x) = Ny. 

a; = l 

The number in B 2 may be determined by making use of the regression 
equation and writing B 2 in the form 

J^f(x){Yx - y) = mY jyix)(x - x) 2 . 
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6 

Since -- is independent of the regression, the variation 

X=1 

in must be due to the single statistic mi and therefore involve one 
degree of freedom. Hence, from (58), there are 5 — 2 degrees of 
freedom in Bi. Since there are b arrays there are iV' 6 degrees 
of freedom in W. Consequently, 


== i loge 


^ N b 
1 ~ b - 2 


is distributed in accord with (45) if H is true. The computed value of 

N -h 
“ 1 _ tl2 5 _ 2 

may, therefore, be compared with one of the entries in Snedecor^s 
table forni = 6 — 2, n 2 = iV* — b. 

This is the test which was promised in Part I to replace the Blake- 
man criterion which Fisher proved was unsound. The student may 
construct a similar argument for testing an hypothesis of linear 
regression of rr on y. 

16. Tests of Significance of r. Let the variables x, y be simultane- 
ously distributed in accord with some one or other of the distribution 
functions 


/(x, y) = — 00 oo,— oo oo, 

where 

~ = 2ir<Tx<Ty(l — 

il 

1 [ - x)^ 2p(x - x){y -y) (y - y)^ | ^ 

2(1 “ p^) 1 (Tx^ <Ts(^y <yy^ J 


and'X, y, <Ta., %, and p are undetermined. In other words, suppose that 
the universe is some normal bivariate distribution. The question of 
the reliability of a value of r computed from a sample of N pairs of 
(x, y) from such a universe may conveniently be discussed under two 
cases. 

Case J. When p = 0. In testing the significance of an observed 
value of r we are testing the hypothesis that p = 0. Under this 
hypothesis the sampling distribution of r is known to be 

/(r) = kil - r2)Ci^-4)/2^ -1 r ^ 1, 
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where l/fc = N — 1/2). The curves represented by this func- 
tion are symmetrical about r = 0 with 

crr= (N - 



As N becomes large the function is practically normal and conse- 
quently 

( 69 ) / = r(iV ~ 1)1/2 

tends to be normally distributed with mean zero and unit standard 
deviation. Therefore, to test the significance of a value of r com- 
puted from a large sample it would not be invalid, to any appreciable 
extent, to refer (59) to a normal probability scale. 

When N is small the problem may be resolved into an analysis of 
variance. In a correlation table, the total variation in the y direction 
may be broken up into two parts, (1) the part Nr^cy^ which may be 
accounted for by an hypothesis of linear regression and (2) the residual 
part NSy^ = N<Ty%\ — r^). If there is no real correlation between 
the two variables then parts (1) and (2) are estimates of the same 
universe variance. Now to apply the z-test we must have unbiased 
estimates. There is one degree of freedom in part (1) and AT — 2 in 


Table 17 



Variation 

D, ofF, 

Regression line 

E(V - 

X 

1 

Residuals 

Z(.Y. - vmx) = Nil - 

X 

A -2 

Totals 

E(2/ - mix) = 

A ~ 1 


X 



part (2). Consequently we may test the independence of y and x by 
computing 

tHN - 2 ) 

( 60 ) Z = i loge^ ^ 


and seeing if it lies beyond the 5% or 1% points in the table for 
ui = 1, ^2 = W “ 2. However, it is conventional to make use of 
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Fisher's i-distribution. It can be shown (see Problem 10) that the 
distribution of ^ is a special case of that for z when ni— 1, ?i 2 = n, 
and 2 ! = i loge Therefore, 


(61) 


i = r 


iV- 2p/2 


is distributed in accord with FJS) for ?^ = iV — 2. In § 11 we ob- 
served that the 0.05 level of significance for z is the .025 'point. How- 
ever, when used as an alternative to i, the 0.05 point of z is also the 
0.05 level because the whole distribution of i is equivalent to the 
positive half of the ^-distribution in the sense that, for tests of signifi- 
cance, z ranges from 0 to oo whereas t ranges from — oo to co . 

Tables are available (Fisher's text. Table V.A.) for applying this 
test directly from r, giving values of r on four levels of significance 
represented by P = .10, .05, .02, and .01, for various values of n. 
It might prove interesting to compare an entry in this table with the 
corresponding entries in the z and t tables. For example, when 
n = 18 (A^ = 20) we find from this table that r — .4438 lies on the 
P = .05 level, and making the transformation to z by (60) we obtain 
z = .7424 which agrees exactly with the entry in the ; 2 :-table at the 
.05 point when ni = 1, ^2 == 18. Finally, when r = .4438 in (61) 
t == 2.101 which is the entry in the ^-table at the .05 level. 

Case II. When p 9 ^ 0. If the samples are large (AT > 100) and 
if p is small or only moderately large {\p\ < .6 perhaps) then it is 
true that r is approximately normally distributed about the value p 
with standard deviation of 

<r, = (1 p2)(iV- 

It is customary, under these conditions, to attach to an observed 
value of r a standard error of 

cr, = (1 ^ r2)(Ar- l)”i/2 

and, for a proposed p, to refer the computed value of 





to a normal probability scale. 

This procedure is invalid, however, if AT is small and p is large. 
The distribution of r from small samples is skew and the skewness 
increases with p. This may be understood intuitively by considering 
the distribution of r's from a universe in which p is .9, The range of 
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possible variation of r above p is only .1. But the possible range 
below p is 1.9. Accordingly the sampling distribution of r (N small) 
from this universe will be sharply skew. An extensive cooperative 
study of the distribution of r was made in 1917 by Soper and others 



Value ofz'observed 


Fig. 22 

They succeeded in finding expressions for its moments and on this 
basis represented the distribution, for various values of JV and p, by 
Pearson curves. They also gave an elaborate set of tables of ordi- 
nates for values of p from 0 to 1 by increments of .1 and for values 
of r from to +1 by increments of .05. The upper panel of 
Figure 22 (from Fisher’s book) shows the r curves for two values 
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of p with N = S, which (presumably) were drawn from the ordinates 
of these tables. They indicate the rapid departure from normality 
that may be expected for small samples as p approaches high values. 

In his study of the sampling distribution of the correlation coeffi- 
cient Fisher found that it was not desirable to use r as the independent 
variable and he introduced a transformation which has distinctive 
merits. He showed that the quantity* 

(62) z' = iloge^^ 

is approximately normally distributed and is nearly constant in form 
as p changes. Its mode is always close to p. The lower panel of 
Figure 22 shows the distribution curves for corresponding to the r 
curves in the upper panel. The standard deviation is 

(63) ^ {N- 3)-i/2 

and is practically independent of p. The transformation is applicable 
in the following tests (among others). 

(а) To test if an observed value of r differs significantly from a 
proposed theoretical value, p. 

(б) To test if two observed values are significantly different. 

The procedure for (a) is to calculatef 

(64) t = (z' - z")(iV - 3 ) 1/2 

and refer the result to a normal probability scale. For (6) the pro- 
cedure is to find, in accordance with (62), the two values of 2 ;', say 
z\ and corresponding to the two observed values of r, say n and 
from samples of Ni and N 2 , respectively. Then compute d ^ — z\ 

and (Td = {1/(^1 — 3) + l/iN 2 ~ 3)} 1/2 and refer 



(Td 


to a normal probability scale. 

For numerical examples the student is referred to Fisher^s book, 
§§ 33-35. Tables are also available there to facilitate the com- 
putation of z' for an assigned r. One should observe that the z' 
technique is not applicable to the case of simple tests of significance 
(p = 0). In that case Fisher’s table of t is available. 

* This quantity is not quite the same as the z used for the ratio of two vari- 
ances and so we xise a prime here to distinguish between them. 

t In ( 64 ), z" is the value of ( 62 ) when r is replaced by p. 
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Three reraarks seem appropriate. (1) In computing an r 
to be tested it is not desirable to apply Sheppard's corrections to 
8z and 8y because they tend to increase the value of r. This also 
applies in testing for linear regression (§ 15). (2) It has been shown 

that the z' procedure is applicable in testing the significance of partial 
correlation coefficients if N in is replaced hy N — k where k is the 
number of secondary subscripts in the coefficient. (3) All of the 
above procedures are strictly valid only for normal universes. How- 
ever, there is couvsiderable experimental evidence to indicate that 
they hold for all practical purposes provided the marginal distribu- 
tions of one or both variables in the universe are not of the J- or U- 
shaped types. Of course, in those extreme cases one would naturally 
hesitate to use r as a measure of association. 
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Problems 

1. Derive the expression for the expected value of in repeated samples of N 

independent observations from an arbitrary universe. Explain the use 
of this expression in estimating the variance of a universe. 

2. In a certain observed distribution, N == 20, 5 = 42, s = 5. Test the hypoth- 

esis that this distribution is a random sample from a normal universe 
with mean of 50. 

3. In a certain test, one section of 20 students had an average score of 40 with 

a standard deviation of 5. Another section of 25 had an average of 46 
with standard deviation of 4. Does this indicate a significant difference 
in the two groups? What assumptions do you make in applying the test? 
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4. In an experiment in industrial psychology a job was performed by one 
group of 30 workmen according to Method I and by a second group of 40 
according to Method II. (The groups were independent and equally 
efficient.) Are the following distributions of the time (in seconds) taken 
such as to justify the conclusion that Method I is the speedier of the two? 
Use the difference between the means as a criterion of judgment. 


Time 

/ 

II 

50 

1 

0 

51 

3 

1 

52 

5 

2 

53 

4 

5 

54 

7 

8 

55 

5 

9 

56 

3 

6 

57 

1 

3 

58 

1 

3 

59 


1 

60 

0 

2 

Totals 

30 

40 


6 . From the separate distribution fimctions of x and s derive, the distribution 
of Student’s ” z, and from that obtain the function Fn(t), 

6. Prove that Fn(t) is asymptotically normally distributed. 

7 . Derive Fisher’s 2 J-distribution, G(z), 

8. (Mills^ text, revised.) Manufacturing industries were classified into those 

producing perishable, semi-durable, and durable goods. An average of 
changes occurring between 1929 and 1933 in the selling prices of the prod- 
ucts of each of these categories was computed giving the index numbers 
shown in the yx column of the following table. 


Class of industry, 

X 

Number of 
industries, Nx ' 

i 

Means, 

Vx 

Computations 

Producing perish- 
able goods 

34 

69.81 

5-1=2, Ar~6=:82 

Producing semi-du- 
rable goods 

26 

66.41 

= 2,161.8800 

Producing durable 
goods 

25 

78.96 

2;2 - 15,564.9040 

All industries 

85 


V = 17,726.7840 


Compute F and test the null hypothesis that there was no real difference 
in the price movements of the three different classes of industry for the 
years 1929-1933. 
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2 N1N2 , 

9. Prove that 2^ N^iyz - i/)^ = -- r— •“ 

a = l iVi+iV2 

10. Prove that the test for significance between two means is a special case of 

the test for significant variation in sub-sets of means by showing that (56) 
of § 14 reduces, when 6 = 2, to 

yx - ya f NiN^ 

^ lATi+iVaJ 

where ^ is an unbiased estimate of <r and t is distributed in accord with 
Fn(t) for n = Ni N 2 — 2, 

The following three problems are from Fisher^s hook, 

11. For the twenty years 1885-1904, the mean wheat yield of Eastern England 

was found to be correlated with the autumn rainfall; the correlation was 
found to be —.629. Is this value significant? 

12. In a sample of iV = 25 pairs of parent and child the correlation in a certain 

character was found to be .60. Is this value consistent with the view 
that the true correlation in that character was .46? 

13. Of two samples the first, of 20 pairs, gives a correlation of .6, the second, of 

25 pairs, gives a correlation ,8. Are these values significantly cMerent? 



CHAPTER VIII 

A. THE X' DISTRIBUTION AND APPLICATIONS 


1. The Multinomial Law.^ The general term of the multinomial 
expansion for fc mutually exclusive categories sets the stage for a 
presentation of which provides an insight into the probability 
theory of this important quantity and its usefulness in the testing of 
hypotheses. So we begin with a preliminary treatment of the multi- 
nomial law. 

Consider an event that is characterized by a variable v which can 
take on one of k values, Vi, • Vk- Let the probability that Vi 

h 

occurs be pi, where = 1. Then in N independent trials, the 
1 

probability that Vi occurs mi times, V 2 occurs times, and so on, 
in a specified order (whatever it may be) is 

Pimip2«i2 . . . 

h 

where = N, the m’s being positive integers or zero. The num- 
1 

ber of ways in which the order can be specified is the number of 
permutations possible among N objects of which mi are of type Ti, 
m 2 of type ^ 2 , * • • of type Tk- Let this number be denoted by 
p[mi]. Then we have 

N ! 

pW - — r ^ :• 

mi 1 m2 1 * • * m* ! 

Therefore, the probability that mi of the variates take the value Vi, 
m 2 the value ^ 2 , and so on, regardless of order is 

( 1 ) /(mi, m2, * • • m/fc) = p[mi]pi^^p2^^ • • • 

which is the general term of the expansion of the multinomial 

(pi + P2 H h Pk)^^ 

The law of repeated trials, for a simple dichotomy, given in Chap- 
ter I, is a special case of this law. Thus if ^ = 2, the right member 
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of (1) reduces to 

( 2 ) C(N, r)prq^-^ 

where 

T = mi, N — r = m2fP = Piy q-l—pi — p2, C(N, r) =N ! /mi ! m2 ! . 


If V is the number of spots appearing on the top face in a throw of a 
die, then v will take on one of the values 1, 2, 3, 4, 5, 6, and the prob- 
ability of throwing exactly r aces (say) in N throws of the die is 


C(Ny 


We recall that (2) is the general term of the expansion of the 
binomial (g + By using Stirling's approximation for factorials, 

we can derive an approximation for (1) which will bear to the multi- 
nomial law a relation analogous to that which the normal curve bears 
to the binomial. With this objective in mind, assume that every m* 
is sufficiently large for m* ! to be replaced by its Stirling approxima- 
tion. Making these replacements (1) becomes, after some algebraic 
rearrangement, 

h 

n (Npi/mi)^i'^^^^ 

(3) /(mi, m 2 , • • • m*) = ( 2 ^jv)(*-i)/ 2 (j 3 jPj . . . 


Next introduce the transformation 


(4) 


ti = 


mi — Npi 
<r< 


being Npi{l — pi). Under this transformation (3) becomes 

Jt / cr/i \-^Pi-<^i<i-l/2 

= _n (l +^j 

Then 

log L.M. = E i-Npi - ciU - I) log (l + ^ 


where L.M. denotes the left-hand member of the preceding equation. 
Upon expanding the logarithm in a power series and collecting the 
results according to descending powers of N, we obtain 


log L.M. 


■t(<ri«i 




2Npi 


terms of lower order 


■V 


/ 
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Let each wii in = iV be transformed in accordance with (4). 
The result is 

k • k 

z<^iti + Nj:pi = N, 

1 1 

whence it follows that ^oriU = 0 since ^pi == 1. Therefore, 
remembering the value of (3) may be written 

/(mi, m2, • • * rrik) = ( 27 rN)^^-^^^KViV 2 * * * 

The form of the exponent of e suggests the substitution of a new 
variable Xi = ti{l — piY'^ in place of U. Upon making this substitution 
we have 

(5) /(mi, ms, • • • m,) = ( 27 ri\r)'i -">/2 

where Xi = (nii — Npi)iNpi)'^^^^ and Npi is the mean or expected 
value of mi. 

Now, following Wilks, ^ the rr^s are independent except for the single 

k 

linear restriction ^{NpiY^Hi = 0. Let R be the region in the x- 
1 

space subject to the linear restriction just given corresponding to any 
region Rm in the m-space. Since the m^s are always integers, the 
change in Xi corresponding to a change of unity in m^• is {Npi)~^^^ = 
Axi. Treating fc — 1 of the x% say xi, X 2 , • • • Xk-i as the independent 
variables, and using an extension of the fundamental theorem on the 
existence of a definite integral (Riemann), we have 

lim • • • Xk) AxiAx 2 • • • Axk^i = 

(6) (2^)<*-l)/2(p^)l/2 

where for a given N, ^ denotes the summation over all points in the 

R 

region R corresponding to those in for which /(mi, m 2 , • • • m&) 
is defined. The integral is fc-dimensional, and dx = dxi dx^ • • • dxk^u 
2. The Distribution* The quantity 

(7) 

1 

is used as an index of the extent to which the set of m's taken as a 
whole cluster about their respective expected values. Later on we 
will explain the practical import of this index. ’ For the present we 
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confine our attention to the purely mathematical problem of finding 
the distribution function of First, we consider the problem of 
finding the distribution function of x- To this end we observe that, 
corresponding to different values of x, (7) defines a set of A-dimen- 
sional hyperspheres all having their centers at the origin of the Xi-axes 
and no two intersecting. Now we can obtain the distribution of x 
by determining the value of the integral in (6) when R consists of the 
region bounded by the concentric hyperspheres 

h k 

(8) = X* and = (x + dx)^ 

1 1 

subject to the condition that 

(9) 'tiNpirHi = 0. 

1 

Since this last equation is a hyperplane through the common center 
of the hyperspheres, the region R is therefore a “ shell ” of a fc — 1 
hypersphere. Within this shell 

to within terms of order dx> 

Now it can be shown that the volume F of an s-dimensional hyper- 
sphere of radius r is 

V = Cr* 

where C is independent of r. The volume between two concentric 
hyperspheres of radii r and r + dr is therefore approximately 

(10) dV = CV-i dr. 

Returning to the x problem, it is clear from (10) that if the region 
bounded by the hyperspheres in (8), subject to the restriction given 
by (9), is chosen as the element of volume, then the probability that 

will lie in the interval from x to x + dx is 
df - x"-" dx. 

independent of x and can be determined by the condition 
1. Using the Gamma function, we find 


1 


k 1 1/2 
1 


( 11 ) 

Here K is 

^ CO 

/ d/ = 
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The distribution of is thus given by 


( 12 ) 


d(x^) = 


(^2)(S-3>/2g-ix* 


2(ife-i)/2r 


(t) 


dlx^) 


The number fc — 1 is the number of degrees of freedom which is the 
number of x’s which are independent in (6). 

3. Tables. The probability of obtaining a sample of a:’s for which 
is greater than an assigned x% say xo^ is given by 


(13) Pir > Xo^) = f d(x^). 

The s 3 rmbol on the left in (13) may be abbreviated to P when there is 
no ambiguity. It is obvious that is never negative and may vary 
from 0 (when there is no difference between the observed and ex- 
pected frequencies) to very large values. As x^ increases from 0 to 
00 , the probability P given by (13) decreases from 1 to 0. The stu- 
dent will recognize Tk^i (x^) as a Pearson Type III curve and the 
integral in (13) as essentially an incomplete Gamma function. Values 
of P can be found in Pearson^s Tables^ and we have included in the 
Appendix (see Table III) a short table, from Fisher’s book,^ giving 
values of x^ corresponding to specially selected values of P. In our 
table, n = fc — 1. 

For fairly large values of fc, (2x^)^^^ is approximately normally 
distributed about a mean {2k — 1)^^^ with unit standard deviation. 
Therefore, one may refer 


t = {2x^yf^ - {2k - 1)1/2 

to a normal probability scale when k > 30. 

4. Applications. The x^-test was designed by its originator, 
Karl Pearson,® as a criterion for testing hypotheses about frequency 
distributions. These hypotheses may be classified into, two types 
which we will call simple and composite. We are making an explicit 
distinction between them and considering them separately to avoid 
certain misunderstandings which have sometimes occurred, in the 
past, in the applications of the test. To be more specific, there has 
been, as a result of confounding hypotheses to be tested, some contro- 
versy over the appropriate number of degrees of freedom to use in 
entering the tables for P(x^ > xo^)- 
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Simple Hypothesis, Under this heading we will consider those 
cases in which the theoretical frequencies are known a priori^ that is, 
when they are not inferred in any way from the sample. 

Suppose that we have a set of h observed frequencies 

+ m 2 + • • • + = iV 

constituting a sample from a hypothetical universe (supposedly 
infinite) in which the relative frequencies in the h categories are 
known to be pi, P 2 , * * *, p*, respectively, where pi = mijN . Then, 
corresponding to the observed frequencies, we have a set of h theoreti- 
cal frequencies such that 

m\ -j- m<i -j- • • • -j- H , 

An example would be, for the m^s, the frequency of heads obtained in 
tossing A" coins h times, and, for the m’s, the corresponding theoretical 
frequencies given by the terms in the expansion of the binomial 
A(| + i)*. In comparing the observed and theoretical frequencies 
a question quite naturally arises as to whether the aggregate discrep- 
ancy between them could be explained on the basis of chance 
fluctuations under the hypothesis that | is the probability of success 
in each trial. More generally, we are interested in such a question 
as the following. On the hypothesis that an observed distribution is 
a random sample from a proposed universe, what is the probability 
that, taken as a whole, the discrepancy between theory and observa- 
tion would yield a value of as large as, or larger than, the value 
obtained. The hypothesis is to be rejected whenever the probability 
is considered small.’’ 

If we let Xi = {jrii — mi) f "s/mi it is clear that the x^s are subject to 
the linear homogeneous restriction given by (8) with n = k — 1 
degrees of freedom because, if ic — 1 of the x’s are fixed, the kth is 
determined. In the case of a simple hypothesis, then, Fisher’s table 
of P is to be entered with n = k — 1. 

With regard to levels of significance, Fisher^ says: 

In preparing this table we have borne in mind that in practice we do not want 
to know the exact value of P for any observed but, in the first place, whether 
or not the observed value is open to suspicion. If P is between .1 and .9 there 
is certainly no reason to suspect the hypothesis tested. If it is below .02 it is 
strongly indicated that the hypothesis fails to account for the whole of the facts. 
We shall not often be astray if we draw a conventional line at .05, and consider 
that higher values of x^ indicate a real discrepancy. 
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Composite Hypothesis. In the majority of practical cases, the 
frequencies are not known a priori and must be estimated from the 
sample. Thus, in a graduation by means of the normal curve the 
theoretical frequencies are obtained by imposing the conditions that 
the universe has the same mean and standard deviation as the sample. 
The x^-test can be accurately applied only if allowance is made for 
the number of parameters which are determined from the sample in 
reconstructing the universe. Stippose there are q parameters in the 
function representing the universe and these are to be determined 
from the sample by the principle of moments. Since any moment is 
a linear function of the frequencies (it will be remembered that the 
frequencies are the variables in this discussion), the determination 
of the q parameters involves q linear restrictions. We have seen in 
§ 2 that the restriction imposed by (9) reduced our problem from a 
space of h dimensions to a space of — 1 dimensions. Quite analo- 
gously, q additional linear restrictions reduce the space to k — I q 
dimensions. Accordingly,* in testing divergence from a universe 
specified by a function /(«;, a, Cy • • •) where v is the variable of the 
distribution and a, 6, c, • • • are q disposable parameters which are 
to be estimated from the sample, the number of degrees of freedom 
with which to enter the tables oi Pisn — k — 1 — q. 

The following two conditions should be fulfilled in applying the 
X^est (for both simple and composite hypotheses). 

1, No class should contain very few items because, in the deriva- 
tion of (11), it was assumed that m* was sufficiently large to replace 
wii ! by its Stirling approximation. 

2. The number of classes should not be very large since it can be 
shown, by expanding the integrand in (13) into a power series, that 
P 1 as » 00 . 

We shall interpret, somewhat arbitrarily, these conditions to mean 
that P cannot be guaranteed when m < 5 and k > 20. To satisfy 
the first condition, it is customary to lump together the small fre- 
quencies at the ends of the distribution. 

Example 1. Twelve dice were thrown 4096 times; only a throw of six was 
counted a success. The expected frequencies are given by 4096 (f + f).^® 
How improbable, taken as a whole, is the observed distribution shown in 
Table 18? 

* Strictly speaking, the determination of the parameters by the method of 
moments does not lead to a system of equations which are exactly analogous 
to (9). 
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Table 18 


1 

Number of 
Successes 

Observed 

Frequency 

Theoretical 

Frequency 

{m — m)2 

{m — m)2 

m 

0 

447 

459 

144 

.3137 

1 

1145 

1103 

1764 

1.5993 

2 

1181 

1213 

1024 

.8442 

3 

796 

809 

169 

.2089 

4 

380 

364 

256 

.7033 

5 

115 

116 

1 

.0086 

6 

24 

27 

9 

.3333 

7 and over 

8 

5 

9 

1.8000 

Totals 

4096 

4096 


x"* - 5.8113 


Entering Table III (see Appendix) with = 8 — 1 = 7, and interpolating for 
the value of P corresponding to the observed value of = 5.8113, we find 
P = .56. Hence there is no reason to reject the hypothesis that the underlying 
chance of a success is p — That is, there is no reason to suspect that the 
dice were biased. 

Example 2. An observed distribution was graduated by means of the normal 
curve (see Part I, p. 123) with the results shown in Table 19. Test the hypoth- 
esis that the observed distribution was a sample from a normal universe with 
mean and standard deviation equal respectively to those of the sample. 


Table 19 


Central 

Observed 

Theoretical 

Values 

Frequency 

Frequency 

29.5 



33.5 

37.5 

56 

60.2 

41.5 

172 

155.4 

45.5 

245 

252.6 

49.5 

263 

258.8 

53.5 

156 

167.2 

57.5 

67 

68.0 

61.5 

65.5 

-{1 

ao.ep';' 

Totals 

1000 

1000.0 


It is found that x* = 4.82. After pooling the end frequencies, as shown, k — B. 
So entering Table III for n = 8 — 1 - 2 = 5, we find that P > .4. Hence the 
x^-test does not reject the hypothesis. 

For applications of the x^-test to contingency tables the reader is referred to 
Fisher’s book. 
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B. STATISTICAL INFERENCE 

5. Induction versus Deduction. To contrast the inductive prob- 
lems, which we are about to consider, with deductive problems, we 
shall review briefly a deductive type of argument which we have 
previously discussed. Suppose D{t) is the distribution function of 
a statistic t computed from a sample from a universe specified with 

respect to functional form and parameters. Then I D{t) dt gives 

t/ — CO 

the probability that an observed value of t will not exceed an assigned 
value of 5. Thus in Chapter VI we learned that the means of 
samples cluster about the mean of the universe, and Theorem X 
of that chapter gave us the probability that a sample mean would 
have a numerical value within 5 of the mean of the universe. This 
is a deductive argument. Presently we shall consider certain inverse 
problems which arise in arguing from samples and their statistics 
back to universes and their parameters. First, however, we shall 
examine Bayes’ Theorem. The following quotation from R. A. 
Fisher® will serve as a setting for our consideration of this theorem. 

Thomas Bayes’ paper of 1763 was the first attempt known to us to rationalize 
the process of inductive reasoning. From time immemorial, of course, men had 
reasoned inductively; sometimes, no doubt, well, and sometimes badly, but 
the uncertainty of all such inferences from the particular to the general had 
seemed to cast a logical doubt on the whole process. By the middle of the 
eighteenth century, however, experimental science had taken its first strides, and 
all the learned world was conscious of the effort to enlarge knowledge by experi- 
ment, or by carefully planned observation. To such an age the limitations of a 
purely deductive logic were intolerable. Yet it seemed that mathematicians 
were willing to admit the cogency only of purely deductive reasoning. From 
an exact hypothesis, well defined in every detail, they were prepared to reason 
with precision as to its various particular consequences. But, faced with a 
finite, though representative, sample of observations, they could make no rigor- 
ous statements about the population from which the sample had been drawn. 

Bayes perceived the fundamental importance of this problem and framed an 
axiom, which, if its truth were granted, would suffice to bring this large class of 
inductive inferences within the domain of the theory of probability; so that, 
after a sample had been observed, statements about the population could be 
made, uncertain inferences, indeed, but having the well-defined type of un- 
certainty characteristic of statements of probability. Bayes’ technique in this 
feat is ingenious. His predecessors had supplied adequate methods, given a 
well-defined population, for stating the probability that any particular type of 
population might have given rise to it. He imagines, in effect, that the possible 
types of popifiation have themselves been drawn, as samples, from a super- 
population, and his axiom defines this super-population with exactitude. His 
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problem thus becomes a purely deductive one to which, familiar methods were 
applicable. 

6. Bayes’ Theorem. To derive Bayes' theorem, consider a bi- 
variate universe of discrete variables in which x takes the values 
Xi, X 2 , ■ * • , Xn, and y the values yi, 2 / 2 , • * * , ym. Let P{xi, yi) rep- 
resent the probability for the joint occurrence of {xi, y^). Let 
P{yi\x^ be the probability that y takes the value y/ when it is 
known that x has taken the value xi. Then 


(14) 


Pivi 1 X,) 


Pjxi, Vi) ^ 
9(X{) 


m 

where g{xi) = ^Pixij yi) is the marginal distribution of x in the 

bivariate universe and represents the a priori probability that x takes 
the value Xi. Let us write (14) in the form 


(15) P{xi, yi) = g(xi)P(yi 1 x<). 


By a similar argument we may write 

(16) P{xi, yi) = h{yi)P{Xi [ y,), 

n 

where /i(y,-) = EP(^ i, yi) is the marginal distribution of y, and 

isa 1 

I yi) is the probability that x — Xi when it is known that y = y,-. 
It is clear from preceeding relations that 


(17) 


Kvi) = E gi^dPiVi I ^i)- 


1 


Since P(a:<, z/,-) means exactly the same thing in (15) and (16) we may 
equate their right members and solve for Pix{ | yi). The result is 


(18) 


P(>:<|»;) 


g(xi)P(y ,- 1 xi) 
HUi) 


r This is Bayes' theorem and it may be stated as follows. 

Bayes’ Theorem. The probability that x = Xi when y = y,- %s equal 
to the product of the probabilities that x = Xi, and that y = y,- when 
X = Xij divided by the probability that y = y,-. 

The theorem is usually expressed syiiibolically in the somewhat 
different form to which it reduces when (17) is substituted for the 
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denominator of (18). This form is 


(19) 


^ g(Xi)P(yi I Xj) _ 
Jlgixi)P{yi 1 Xi) 

J 1 


To connect Bayes^ theorem with a posteriori^ or inverse, proba- 
bility suppose in (19) that the x^8 denote certain initial situations and 
the denote events subsequently observed. The a priori proba- 
bility for the existence (occurrence) of the initial situation char- 
acterized by Xi is g(xi). P(yj | Xi) is the a priori probability that y^ 
will occur when Xi exists. Then (19) gives the a posteriori proba- 
bility that the ^th initial situation has produced the observed event 
specified by y^. 

The following examples will clarify the theorem and serve to focus 
attention on its weakness. The first example, a somewhat artificial 
one, is designed to illustrate a situation where the existence proba- 
bilities g{xi) are equal. The second will describe a situation when 
nothing is known about them. 


Example 3. (Molina'^) During his sophomore year Tom Smith played on 
both the baseball and football teams; we have been informed that he broke his 
ankle in one of the games; what are the a posteriori probabilities in favor of 
baseball and football, respectively, as the baneful cause of the accident? Evi- 
dently the answer depends on the number of baseball and football games played 
during their respective seasons and also on the likelihood of a man breaking an 
ankle in one or the other of these two sports. As a concrete case assume that: 

(а) At Smith’s college an equal number of baseball and football games are 
played per season; 

(б) Statistical records indicate that if a student participates m a baseball 
game the probability is tI ^ that he will break an ankle and that, likewise, the 
probability is jiis for the same contingency in a football game. 

Solution. Associate xi and X 2 with the admissible causes, baseball and foot- 
ball, respectively. Associate yi with the accident. From condition (a) of the 
problem, the existence probability for baseball is g{xi) — §, Also P{yi | xi) 
— if IT, and P(yi | xz) — From (19), then, the a posteriori probability for 
baseball is 


P{xi 1 yi) 


1 ^ 

2’ 100 

i.A . l.X 

2 100 2 100 


2 

9 ‘ 


It follows that the a posteriori probability in favor of football is J. 

Example 4. An urn contains five balls, black, white, or both kinds. Of three 
balls drawn together and at random (each ball within the urn is equally likely to 
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be drawn), two are black and one is white. What is the probability that the 
urn contains three black and two white balls? 

Solution, Associate Xi, X2 * * • , Xe, with the possible compositions of the urn 
before the drawing was made, namely, OB, 5W; IB, 4W; • • • ; 5B, Olf. Associate 
Vu 2 / 2 , 2 / 3 , 2 / 4 , with the possible compositions in the drawing of three balls, namely, 
OB, SW; IB, 2W; 2B, IW; 3B, OW. The composition corresponding to yz 
was obtained and we seek the probability that it came from an urn with composi- 
tion specified by x^. That is, we seek P{x^ 1 2/3). Clearly, 


P(2/3 I Xi) = 


C(3, 2)C(2, 1) ^‘3 
0(5,3) 5’ 


so from (19) we have 

(20) P{x4 I 2/3) = 


gM f 

5 

^ , C(i, 1)C(5 - i, 2) ’ 

.?f“ — 5a3) — 


it being understood, of course, that C(n^r) =0 when n <r. 

Since the values of g(xi) are unknown the problem does not have 
a unique solution. Moreover, if they were known we would be 
back in the domain of deductive probabilities again since all the 
probabilities in the right-hand member of (20) would then be known 
a priori. It is only when g{x^ are unknown that we are properly 
in the domain of a posteriori probability. In practical problems 
the g{x^ are scarcely ever known. 

Bayes realized this and argued that the x’s may be considered 
equally probable unless we have some reason to think they are not. 
Under this doctrine of insufficient reason,” the x’s are assumed to 
have equal existence probabilities. In this case, g{xi) = constant 
and would cancel in (19), thus permitting a definite solution in (20). 
It appears that Bayes had serious doubts about this doctrine ” for 
he withheld his entire treatise from publication until his doubts 
should be resolved, and it was only after his death that his paper 
was published by friends. Laplace, however, was less cautious, and 
he incorporated the doubtful theorem into his Theorie Analytique des 
ProbabilitSs. Robed in the authority of Laplace it went unques- 
tioned for a long time. Boole was the first, in 1854, to criticize the 
assumption of “ the equal distribution of our knowledge, or rather 
of our ignorance ” and “ the assigning to different states of things of 
which we know nothing, equal degrees of probability.” Today, it 
is well known that the assumption of constant existence probabilities 
may lead to mathematical contradictions. This may clearly be seen 
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in the analogue to (19) for continuous variables. The following 
illustration of such a contradiction is cited by Wilks (loc. cit). 

Let 0 be a parameter characterizing the universe and t a statistic 
from the sample. Then the analogue to (19) for the continuous 
case is 


( 21 ) 


F{e \t) de == 


g(d)f(t I (9) do dt 
dtfg{d)f{t 1 e) de 


Now, if according to the doctrine of insufficient reason we may 
assume g{e) to be constant, (21) reduces to 


( 22 ) 


Fie \t) de = 


fit I e) de 
ffit\e) de^ 


But by the very nature of this doctrine there is no more reason 
to assume the a priori probability function of e to be constant than 
there is to assume the a priori probability distribution of some 
function of e, say to be constant. The a priori distribution 
oi - z is giVz)l2Vz. If giVz)f2Vz is constant, then 


Fie \t)de 


efjt I e) de 
fefit 1 e) de 


which is certainly inconsistent with (22). 

In arguing from a sample to the universe, any inference must be 
attended with some degree of uncertainty. But uncertainty should 
not be confused with lack of rigor. As we shall see, statements can 
be made about population parameters, .subject to risks of being 
wrong, where the error is precisely expressed in terms of probability 
theory. In other words, the nature and degree of the uncertainty 
can be rigorously expressed. This can be accomplished without any 
assumptions regarding the a priori existence probabilities. 

7. Probable Error. The following concise exposition of the various 
usages of the term probable error ” is due to Professor A. T. Craig, 

There are in the literature three conceptions of the probable error. 
If, purely for convenience of language, we refer to the probable error 
of the mean, these conceptions can be stated as follows: (i) The 
probable error of the mean is that deviation, extended on both sides 
of the mean of the population, such that J is the probability that the 
mean of a sample will fall in this interval; (ii) The probable error of 
a mean is that deviation, extended on both sides of the mean of a 
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sampUj such that | is the probability that the mean of the population 
lies in this interval; (iii) The probable error of the mean is that devia- 
tion, extended on both sides of the mean of a sample, such that | is 
the probability that the mean of another sample will fall in this inter- 
val. Conception (i) leads without difficulty to the usual formula 
.6745(<r/VAr) for the probable error of the mean. This formula is 
rigorously correct for samples of any size drawn from a normal popu- 
lation and is valid for large samples drawn from any population with 
finite variance. On the other hand, the formula cannot be estab- 
lished under conception (ii) without further assumptions. If, before 
the sample is drawn, it is assumed, in the absence of any knowledge 
concerning the distribution of possible values of the mean of the 
population, that the existence distribution is constant, then the 
formula admits mathematical proof. But this assumption is essen- 
tially the same assumption as that made in applying Bayes’ Theorem 
to problems of probability a posteriori. 

The modern method of expressing the reliability of a statistical 
estimate of a population parameter in terms of fiducial limits seems 
likely to replace the traditional but often misleading mode of expres- 
sion involving probable error. The rest of the chapter is devoted to 
this recent advance in statistical inference. 

8. Fiducial Theory. The material of this section is reproduced 
from a recent paper on this subject by Rietz.^ 

In explaining the meaning of the probable error of a statistic, one 
of the usual types of definition is essentially the following: The 
probable error of a statistic, t, is a positive number, Et, such that the 
chances are even that the population parameter of which t is an estimate 
from the sample, will fall within the interval t Etto t E^ 

This definition contains an inference about the values of a popula- 
tion parameter on the basis of information obtained from a random 
sample drawn from the population. 

Formulas for Et^ in terms of observed data, when t may represent 
any one of a considerable number of statistics, say an arithmetic 
mean or a correlation coefficient, are usually listed for convenient 
application in numerous textbooks for teaching courses in sta- 
tistics. 

Under the definition stated above, it is noteworthy that these for- 
mulas depend on a fundamental assumption whose validity has long 
been in doubt. The assumption in question is to the effect that 
initially, that is, before our drawings of a sample are made, in our 
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lack of knowledge about the distribution of possible values of an 
unknown parameter, say of 6, we may assume the existence distribu- 
tion of 6 to be constant. 

The invalidity of this assumption in many applied problems of 
statistical interest may be seen clearly in cases of a continuous distri- 
bution function with a derivative. Suppose that our initial assump- 
tions relating to a parameter 6 were such that 0 would initially be 
distributed in accord with a continuous frequency function, gr(0), 
which has a derivative at each point within its possible range on 0, 
say from 6 - a to 6 = Next, suppose g{6) were restricted to be 
constant throughout the range of 6. Then it is well known that the 
distribution of a simple non-linear function of d would not be con- 
stant. For example, the distribution of z = 6^ (n 6 real and 
non-negative) would not be constant, but would be distributed in 
accord with a frequency function But if 0 is a popula- 

tion parameter, it seems fairly obvious that the logical character 
of our theory should usually, if not always, be such as to enable us to 
use a power of 0 as a parameter if we found it convenient to do so. 

The preceding introduction is designed to lead up to the important 
fact that, although in the usual statistical inquiry by sample, the 
true value of the population parameter d is unknown and remains 
unknown, there are cases in which precise statements can be made in 
terms of probabilities about the bounds within which a parameter $ 
lies without making an assumption about the initial distribution of the 
possible values of 6. It has been only about seven years since R. A. 
Fisher initiated some important ideas in this connection to which 
interesting contributions have been made by several mathematical 
statisticians.^”^^ 

For simplicity, consider a case of a single parameter, 6j in which 
we know the frequency function of the statistic, t, to be given by an 
integrable function 

(23) 

where the values of t obtained from observation may be assumed to 
be good estimates of 6. Suppose we know (23) in such form that it is 
possible to calculate a table of values of the probabilities that the 
statistic, tj will fall into an assigned interval selected on a possible 
range (a, h) for any assigned value of 6 within the possible range 
(a,^)ofl 

Next, for illustration, select a positive number e, say e = .005, 
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on which to base a certain level of confidence about values of d to be 
expressed in terms of probabilities. 

As our main problem may be clarified by a geometrical represen- 
tation, conceive of corresponding values of t and 6 obtained in an 
extensive statistical experiment as represented by rectangular coordi- 
nates within the rectangle bounded by lines t a,t = b, d = a, d = 
(Fig. 23.) 

Consider an arbitrary assignment for 6, say^hat 6 = 6' is the true 
value of e. This gives the line AB (Fig. 23). Since the distribution 
of the statistic t is assumed to ^ 
be known for each assigned ^ 
value of dj we may locate on 
the line AB two points, U and 
U (k ^ ^ 2 ) such that e is equal 
to the probability that t ob- 
tained from a random sample 
will yield a value of t less than 
or equal to kj and similarly e ^ 
is the probability that such a ^ 
sample will yield a value greater ^ 
than or equal to h. Then we 
have an interval on AB from k 
to U such that 1 — 2€ is the probability that the random sample will 
yield a value within this interval. 

More formally stated, we may introduce a function F{k 6) defined 
as the definite integral of Jit, B) in (23) from t = a to t. That is, 

F{k e) - ffik e) dt, 

%J a 

for any arbitrarily assigned real value of & on its range from a to iS. 
Then 

F{a, d) = 0, FQ), B) = 1, F(«i, B') = e, F(t„ 8') = 1 - 6, 

(0 < e < 1). 

By considering all possible assignments of B, in its possible range 
{a, 0), the locus of our set of lower values of t, illustrated by t on the 
line AB, will give a continuous curve which we mark with in 
Figure 23, the subscript e being used to remind us that e is the proba- 
bility that a random value of t ior 8 = 6' will fall below or at it. 
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Similarly, our set of upper values of illustrated by h on AB, give 
a curve which we mark with Ci_g. 

If iis a good estimate of 6, its value usually, if not always, increases 
with 6 for all possible values. Thus, we shall restrict our further 
considerations to cases in which we may assume that t increases as d 
increases and vice versa. More precisely we are concerned with 
one-valued monotone increasing functions represented by the two 
curves marked and The region bounded by these two curves 

and the lines 6 = a and 6 — ^ has been called by Neyman the con- 
fidence belt with confidence coefficient equal to 1 — 26. 

Next, consider the set of points, {t, d), that would be obtained in 
Figure 23 in carrying out an extensive statistical experiment for 
which we seek a degree of accuracy in the long run, indicated by the 
value we assign to e. Then it is fairly obvious that the confidence 
belt is so constructed that 1—26 is the expected relative fre- 
quency with which points, (t, 6), will lie inside the confidence belt, 
and 26 is the expected relative frequency with which such points 
will lie outside the confidence^ belt or on its boundary, whatever 
the nature of the initial distribution function of the parameter B 
may be. 

Conceive of drawing a large number of sets of random samples 
of N items each from a population consisting either of an infinite 
supply or of a finite supply with replacements, and that one of these 
samples, taken at random, yields a value of t ~ t^ for a certain 
statistic, then the line t = U parallel to the ^-axis would fail to inter- 
sect the boundaries of the confidence belt, in two points, in at most 
a small fractional part (less than 2e) of the total number of sets of 
drawings. Denote the ordinates of the points in which the line 
t — to cuts the curves Ci_e and Ce by Bi and 62, respectively (Figure 23). 
These boundary values of B are called fiducial limits of 6 that cor- 
respond iot = U and the interval Bi to B2 is called the fiducial interval 
for t = to. It is important to emphasize that the statement that 
1 — 26 is the probability that a value of 6 taken at random will fall 
into the confidence belt is to be associated with the whole belt, 
that is, with results of repeated application of a sampling procedure 
to all values of t met with in an extensive statistical experiment, and 
not merely with an assigned t The probability that (B, t) falls 
within the confidence belt may differ for different assignments of i, 
but in the long run of statistical experience, the expected relative 
frequency of points within the confidence belt is 1 — 26. By choos- 
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ing € to be small, the probability is nearly 1 that the parameter lies 
within the confidence belt. 

The theory of confidence belts and fiducial intervals finds its 
main application in the testing of a certain hypothesis for possible 
rejection under the assumption that it is true. Such an hypothesis 
has been termed a null hypothesis. If, for a given e, the null hypoth- 
esis is rejected due to the value of t found from the actual data, the 
value of t is said to be significant at the level of probability equal 
to 2 6. On the other hand, a value of t from observed data which 
does not reject the null hypothesis is said to be non-significant. 

9. Fiducial Limits, (a) For the mean. Let x and s be the mean 
and standard deviation of a sample of AT = n + 1 items drawn from 
a normal universe with unknown mean x. The problem is to deter- 
mine an interval surrounding x in which we may assume, with a 
certain degree of confidence, that x is contained. We learned in 
Chapter VII that the variable 

V nix — x) 

(24 t = ^ ^ 


is distributed in accord with the F^if) curve and that P == 1 — 
has been tabulated for various values of t and n, where 


P 


n 



Fn{t) dt. 


Therefore, for an assigned e and for an assigned value of n, (n < 30), 
we may obtain from the tables upper and lower critical values of t 
by solving the equation P = 2e. With these critical values we can 
determine from (24) the required interval surrounding x for the 
given value of e. It is conventional among certain workers to take 
e = .005 (or .025) since they wish to determine values of the estimates 
of X in an interval dividing hypotheses that will be rejected from 
those acceptable under a null hypothesis at the 1% (or 5%) level of 
significance. 

Suppose, then, that we make the claim 


(25) 



<x <x + te 




and we desire the probability of an error in this statement to be not 
more than 2e = .01. Taking n = 15, for example, we find from 
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Table 12, Chapter VII, that t = ±2.947 when P = .01. Then we 
have 

, , ±2.947s 

(x ~ = 7=r- 

VlS 

= ±.76s 

and the claim 

X — .76s <x <x + .76s 
will be correct 99% of the time. 

It is clear from the above procedure that our confidence in the 
fiducial limits x ± isl\/n is measured by the area under the FJJ) 
curve inside i = ±4, that is, by Pn(4). This means that if we 
could observe all possible samples, the proportion represented by 
Pn(^e) would yield values of x and s for which the claim (25) is true, 
while the remaining proportion, P = 1 — Pn(^«), would yield values 
of X and s for which the claim is false. 



9^v7m 

Fig. 24 

If we were testing a hypothetical value of x we would say that 
X is not significant at the 1% level of significance if x has any value 
in the x interval, e = .005. If x does not lie in this 

interval we say that x is significant at this level. 

Obviously, values of i satisfying the equation P = .01, that is, 
PJS) ” vary with n. To avoid the trouble' of entering a table 
we give an alternate method which is valid when the sample is not 
small. Recall that the variable 

^ (x — x)Vn — 3 
s 
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is approximately normally distributed when N > 30. The area 
under the normal curve outside t = ±2.676 is .01. Therefore, the 
99% fiducial range of x is then 


X ± 


2.576g 
ViV - 3 


and the range gets smaller as N increases. 

(b) For the difference between two means. Let Xi and Si® be the 
observed mean and variance of a sample of Ni drawn from a normal 
universe with unknown mean xi and let x^ and S 2 ^ be the observed 
mean and variance of a sample of JVi drawn from a normal universe 
with unknown mean Xz. It is assumed that the two universes have 
a common variance cr^. For brevity, let 

w = xi- X2, w = Xi — Xz, N = Ni + Ntt 



is distributed in accord with F„(t) for re = JV — 2. From (26), 
upper and lower fiducial values of w can be found by assigning to t 
the solutions of Pn{t) = .99, that is, of P = .01. If the value 
w = 0 falls outside the fiducial interval thus established, the con- 
clusion is that the difference between the means is significant at the 
1% level. That is, w 9 ^ 0 and hence xi 9 ^ Xi. 

If the two samples are equal in number so that the variates can be 
paired in some manner we may compute (26) by a different method. 

N 

Let N = Ni = N2, w = xi — X2j and compute w and — wy. 

1 

Then 

w — w 
t 

w — w 

Sw 


ViV- 1 


184 


Mathematics of Statistics 


(27) 


w — w 



1 


L N{N - 1) J 


The last expression is sometimes called BesseVs Formula. 


Example 5. (Snedecor^^) Imagine a newly discovered apple, attractive in 
appearance, delicious in flavor, having apparently all the qualifications of suc- 
cess. It has been christened King.” Only its yielding capacities in various 
localities is yet to be tested. The following procedure is decided upon. King is 
planted adjacent to Standard in 15 orchards scattered about the region suitable 
for production. Years later, when the trees have matured, the yields are meas- 
ured and recorded in the following table where xi refers to King, X 2 to Standard, 
and w = xi — X 2 . The yields are in bushels. 


Xi 

X2 

w 

{w — wY 

13 

11 

2 

16 

12 

6 

6 

0 

10 

3 

7 

1 

6 

1 

5 

1 

13 

7 

6 

0 

15 

10 

5 

1 

19 

9 

10 

16 

10 

4 

6 

0 

11 

3 

8 

4 

11 

6 

5 

1 

13 

8 

5 

1 

9 

5 

4 

4 

14 

7 

7 

1 

12 

6 

6 

0 

12 

4 

8 

4 

Totals 

90 

50 


Substituting in (27) we get 

6 — ^ 6 — ty 

* "" r 50 “ .488 ’ 

L (15) (14) . 


Interpolating in Table III for n = 14 and checking the result in the more exten- 
sive table in Fisher^s text we find that F = .01 when t = 2.977. Then solving 
the equation 


6 -- w 


= ±2.977 


,488 
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we obtain w — 4.55 and w — 7.45. Since {& = 0 is outside the interval from 
4.55 to 7.45, the observed value of w differs significantly from either value of the 
parameter. In other words, for these as well as for all values outside the fiducial 
interval 4.55-7.45, we would reject (at the 1% level of significance) the null 
hypothesis that there is no significant difference between the yields of the two 
varieties, insofar as their means provide a criterion of judgment. 

(c) For the variance. In (25) of Chapter VII we obtained the dis- 
tribution of which we will now write in the form 





If we let == Ns^/(r^ we get the x^ distribution given in (12) with 
N replacing fc, 


T(x^) dx^ 


g-a;2/2(^2) (W-S)/2 


2iN-l)l2-p 



dx\ 


That we should thus obtain (12) is more than a coincidence, because 
it turns out that Ns^/cr^ actually is x^ for ^ observations made on a 
single magnitude. If now we let n = iV — 1 we obtain the distribu- 
tion for n degrees of freedom, 


(28) 


W) dx^ = 





dx^ 


To determine the fiducial limits of cr^ we first observe from (3) of 


Chapter VII that Ns^ = = ^(a;* — x)2, and therefore we may 

write x^ — na^fcr^. If now we make the claim 




<a^ <■ 


rKT‘ 


Xi 


where xi^ and are arbitrarily chosen constants (xi® < then 
our “ measure of confidence ” in the correctness of this claim, is given 
by I»(xi®) ~ Inixi^), where 

/n(xO = /* ” W) 

Values of J„(x*) can be obtained from Pearson’s Tables} 
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Fig. 25 


For further study of fiducial inference and its applications to testing 
hypotheses, the reader is referred to the publications of Fisher,® 
Neyman,^® and Wilks.^^ 
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Exercises 


1. Bead the following paper: The y^-Test of Significance^ T. C. Fry, Journal of 

the American Statistical Association, vol. 33, pp. 513-525. (The three 
papers following Fry’s exposition are also recommended.) 

2. Toss seven coins 128 times and record the frequencies of heads. Apply the 

xHest to the resulting distribution. 

3. Graduate an appropriate distribution in Part I by means of the normal 

curve and test the composite hypothesis that the observed distribution 
was a sample from a normal universe having the mean and standard 
deviation of the sample. 

4. Give a report on and contingency tables. 

6. (Chrystal) A bag contains three balls, each of which is either white or 
black, all possible numbers of white being equally likely. Two at once 
are drawn at random and prove to be white. What is the probability 
that all of the balls are white? Ans. 

6. If, in Example 4, it is assumed that, initially, all possible numbers of white 

balls in the urn are equally likely, what is the solution? 

7. If N is large s how th at the 95% fiducial range of x for a normal universe is 

S ± 1.96/ViV - 3. 

8. Making use of the references cited prepare a report on fiducial inference. 
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Review Problem 


A question arose in a physical education class as to whether eleven-year-old 
girls weigh, as a rule, more than eleven-year-old boys. Suppose you wished to 
make a thorough analysis of the data in the table below concerning weights of 
boys and girls aged eleven. Describe the tests you might apply, the reasoning 
and assumptions underlying these, and the interpretation that might be placed 
on the results. 


‘ 

Weight {'pounds) 

Class Marks 

Frequency 

Boys 

Girls 

42.5 

1 

0 

48.5 

3 

1 

54.5 

9 

7 

60.5 

33 

37 

66.5 

65 

41 

72.5 

80 

59 

78.5 

72 

58 

84.5 

41 

48 

90.5 

27 

23 

96.5 

7 

26 

102.5 

4 

16 

108.5 

2 

5 

114.5 

1 

3 

120.5 

0 

2 

Totals 

345 

326 


The following points are suggested for discussion: 

(o) Is there a clear difference between the two distributions? How would 
you test this: from the means, from the variances, from the samples as a whole? 

(6) 32.3% of the boys and 26.4% of the girls have weights less than 69.5 
pounds. Is this difference significant? 

(c) Within what limits would you say that the mean and standard deviation 
in the population of eleven-year-old boys (from which you have the sample of 
345) is almost certain to lie in each case? 

(d) Summarize your results. 
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APPENDIX 


Tables 

I. Ordinates and Areas op the Normal Curve. 

II. 5% ANX> 1 % Points for the Distribution op F. 

III. Probability Scale. 




Table I. Oedinatbs and Akeas of the Noemal Cueve, =--i=-e-’^V2 

V27r 


i 

4 >( t ) 


t 



i 



.00 

.39894 

.00000 

.45 

.36053 

. 17364 

.90 

.26609 

,31594 

.01 

.39892 

.00399 

.46 

.35889 

.17724 

.91 

.26369 

.31859 

.02 

.39886 

.00798 

.47 

.35723 

. 18082 

.92 

.26129 

.32121 

.03 

.39876 

.01197 

.48 

.35553 

. 18439 

.93 

.25888 

.32381 

.04 

.39862 

.01595 

.49 

.35381 

.18793 

.94 

.25647 

.32639 

.05 

.39844 

,01994 

.50 

.35207 

. 19146 

.95 

.25406 

.32894 

.06 

.39822 

.02392 

.51 

.35029 

.19497 

.96 

.25164 

.33147 

.07 

,39797 

.02790 

.52 

.34849 

. 19847 

.97 

.24923 

.33398 

.08 

.39767 

.03188 

.53 

.34667 

.20194 

.98 

.24681 

.33646 

.09 

.39733 

.03586 

,54 

.34482 

.20540 

.99 

.24439 

.33891 

.10 

.39695 

.03983 

.55 

.34294 

.20884 

1.00 

.24197 

.34134 

.11 

.39654 

.04380 

.56 

.34105 

.21226 

1.01 

.23955 

.34375 

.12 

.39608 

.04776 

.57 

.33912 

.21566 

1.02 

.23713 

.34614 

.13 

,39559 

.05172 

.58 

.33718 

.21904 

1.03 

.23471 

,34850 

.14 

.39505 

.05567 

.59 

.33521 

.22240 

1.04 

.23230 

.35083 

.15 

.39448 

.05962 

.60 

.33322 

.22575 

1.05 

.22988 

.35314 

.16 

.39387 

.06356 

.61 

.33121 

.22907 

1.06 

.22747 

.35543 

.17 

.39322 

,06749 

.62 

.32918 

.23237 

1.07 

.22506 

.35769 

.18 

.39253 

.07142 

.63 

.32713 

.23565 

1.08 

.22265 

.35993 

.19 

.39181 

.07535 

.64 

.32506 

.23891 

1.09 

.22025 

.36214 

.20 

.39104 

.07926 

,65 

.32297 

.24215 

1.10 

.21785 

.36433 

,21 

.39024 

.08317 

.66 

.32086 

.24537 

1.11 

.21546 

.36650 

.22 

.38940 

.08706 

.67 

.31874 

,24857 

1,12 

.21307 

.36864 

.23 

.38853 

.09095 

.68 

.31659 

.25175 

1.13 

.21069 

.37076 

.24 

.38762 

.09483 

.69 

.31443 

.25490 

1.14 

.20831 

.37286 

.25 

.38667 

.09871 

.70 

.31225 

.25804 

1.15 

.20594 

.37493 

,26 

.38568 

. 10257 

.71 

.31006 

.26115 

1.16 

.20357 

.37698 

.27 

.38466 

. 10642 

.72 

.30785 

.26424 

1.17 

.20121 

.37900 

.28 

,38361 

. 11026 

.73 

.30563 

.26730 

1.18 

. 19886 

.38100 

.29 

.38251 

. 11409 

.74 

.30339 

,27035 

1.19 

.19652 

.38298 

.30 

.38139 

. 11791 

.75 

,30114 

.27337 

! 1.20 

. 19419 

.38493 

.31 

.38023 

. 12172 

.76 

.29887 

.27637 

1.21 

.19186 

.38686 

.32 

.37903 

. 12552 

.77 

.29659 

.27935 

1.22 

. 18954 

,38877 

.33 

1 .37780 

. 12930 

.78 

.29431 

.28230 

1.23 

. 18724 

.39065 

.34 

.37654 

. 13307 

.79 

.29200 

.28524 

1.24 

. 18494 

.39251 

.35 

.37524 

. 13683 

.80 

.28969 

.28814 

1,25 

. 18265 

.39435 

.36 

.37391 

.14058 

.81 

.28737 

.29103 

1.26 

.18037 

.39617 

.37 

.37255 

.14431 

.82 

.28504 

.29389 

1.27 

. 17810 

.39796 

.38 

.37115 

.14803 

.83 

.28269 

.29673 

1.28 

. 17585 

.39973 

.39 

.36973 

.15173 

.84 

.28034 

.29955 

1.29 

. 17360 

.40147 

.40 

.36827 

.15542 

.85 

.27798 

.30234 

1.30 

.17137 

.40320 

.41 

.36678 

.15910 

.86 

.27562 

,30511 

1.31 

. 16915 

.40490 

.42 

.36526 

. 16276 

.87 

.27324 

.30785 

1.32 

. 16694 

.40658 

.43 

.36371 

. 16640 

.88 

.27086 

.31057 

1.33 

. 16474 

.40824 

.44 

.36213 

. 17003 

.89 

.26848 

.31327 

1.34 

. 16256 

.40988 
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Table L 


Ordinates and Areas of the Normal Curve, 


1 


V27r 




,, 1 

4 >{ t ) 


t 

4 >( t ) 


t 

<#>(«) 


1.35 

. 16038 

.41149 

1.80 

.07895 

. 46407 

2.25 

.03174 

.48778 

1.36 

. 15822 

.41309 

1.81 

.07754 

.46485 

2.26 

,03103 

.48809 

1.37 

. 15608 

.41466 

1.82 

.07614 

.46562 

2.27 

.03034 

.48840 

1.38 

. 15395 

.41621 

1.83 

.07477 

.46638 

2.28 

.02965 

.48870 

1.39 

. 16183 

,41774 

1.84 

.07341 

.46712 

2.29 

.02898 

.48899 

1.40 

. 14973 

.41924 

1.85 

.07206 

.46784 

2.30 

,02833 

.48928 

1.41 

. 14764 

.42073 

1.86 

.07074 

.46856 

2.31 

.02768 

.48956 

1.42 

. 14556 

.42220 

1.87 

.06943 

.46926 

2.32 

.02705 

.48983 

1.43 

. 14360 

,42364 

1.88 

.06814 

.46995 

2.33 

.02643 

.49010 

1.44 

. 14146 

.42507 

1.89 

,06687 

.47062 

2.34 

.02582 

.49036 

1.45 

.13943 

.42647 

1.90 

.06562 

.47128 

2.35 

,02522 

.49061 

1.46 

. 13742 

.42786 

1.91 

.06439 

.47193 

2.36 

.02463 

.49086 

1.47 

.13542 

.42922 

1.92 

.06316 

.47257 

2.37 

.02406 

.49111 

1.48 

. 13344 

.43066 

1.93 

.06195 

.47320 

2.38 

.02349 

.49134 

1.49 

.13147 

.43189 

1.94 

.06077 

.47381 

' 2.39 

.02294 

.49158 

1.50 

. 12952 

.43319 

1.95 

.05959 

.47441 

2.40 

.02239 

.49180 

1.51 

. 12768 

.43448 

1.96 

.05844 

.47500 

2.41 

.02186 

.49202 

1.52 

. 12666 

.43574 

1.97 

.05730 

.47558 

2.42 

.02134 

.49224 

1.53 

. 12376 

.43699 

1.98 

.05618 

.47615 

2.43 

.02083 

.49245 

1.54 

.12188 

.43822 

1.99 

.05508 

.47670 

2.44 

.02033 

.49266 

1.55 

.12001 

.43943 

2.00 

.05399 

.47725 

2.45 

.01984 

.49286 

1.56 

.11816 

.44062 

2.01 

.02592 

.47778 

2.46 

.01936 

.49305 

1.57 

.11632 

.44179 

2.02 

.05186 

.47831 

2.47 

.01889 

.49324 

1.58 

.11450 

.44295 

2.03 

-.05082 

.47882 

2.48 

.01842 

.49343 

1.59 

.11270 

.44408 

2.04 

.04980 

.47932 

2.49 

.01797 

.49361 

1.60 

.11092 

.44520 

2.05 

.04879 

.47982 

2.50 

1 .01753 

1 .49379 

1.61 

.10915 

.44630 

2.06 

.04780 

.48030 

2.51 

.01709 

,49396 

1.62 

.10741 

.44738 

2.07 

.04682 

' .48077 

2.52 

.01667 

.49413 

1.63 

. 10567 

.44845 

2.08 

.04586 

.48124 

2.53 

.01625 

.49430 

1.64 

.10396 

.44950 

2.09 

.04491 

.48169 

2.54 

.01585 

.49446 

1.65 

.10226 

.45053 

2.10 

.04398 

.48214 

2.55 

.01545 

.49461 

1.66 

.10059 

.45154 

2.11 

,04307 

.48257 

2.56 

.01506 

.49477 

1.67 

.09893 

.45254 

2.12 

.04217 

.48300 

2.57 

.01468 

.49492 

1.68 

.09728 

.45352 

2.13 

.04128 

.48341 

2.58 

.01431 

.49506 

1.69 

.09566 

.45449 

2.14 

,04041 

.48382 

2.59 

.01394 

.49520 

1.70 

.09405 

.45543 

2.15 

.03955 

,48422 

2.60 

.01358 

.49534 

1.71 

.09246 

.45637 

2.16 

.03871 

.48461 

2.61 

.01323 

.49547 

1.72 

.09089 

.45728 

2.17 

.03788 

.48500 

2.62 

.01289 

.49560 

1.73 

.08933 

.45818 

2.18 

.03706 

.48537 

2.63 

.01256 

.49573 

1.74 

.08780 

.45907 

2.19 

.03626 

.48574 

2.64 

.01223 

.49585 

1.75 

.08628 

.45994 

2.20 

; .03547 

.48610 

2,65 

.01191 

.49598 

1.76 

.08478 

.46080 ! 

2.21 

.03470 

.48645 

2.66 

.01160 

,49609 

1.77 

.08329 

.46164 

2.22 

,03394 

.48679 

2.67 

.01130 

.49621 

1.78 

.08183 

. 46246 

2.23 

.03319 

.48713 

2.68 

.01100 

,49632 

1.79 

.08038 

.46327 

2.24 

,03246 

.48745 

2,69 

.01071 

.49643 
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Table I. Oedinates and Areas op the Normal Curve, 

V2t 


t 



t 



t 

4 > it ) 


2.70 

.01042 

.49653 

3.15 

.00279 

.49918 

3.60 

.00061 

.49984 

2.71 

.01014 

.49664 

3.16 

.00271 

.49921 

3.61 

.00059 

.49985 

2.72 

.00987 

.49674 

3.17 

.00262 

.49924 

3.62 

.00057 

.49985 

2.73 

.00961 

.49683 

3.18 

.00254 

.49926 

3.63 

.00055 

.49986 

2.71 

.00935 

.49693 

3.19 

.00246 

.49929 

3.64 

.00053 

,49986 

2.75 

.00909 

.49702 

3.20 

.00238 

.49391 

3.65 

.00051 

.49987 

2.76 

.00885 

.49711 

3.21 

.00231 

.49934 

3.66 

.00049 

.49987 

2.77 

.00861 

.49720 

3.22 

.00224 

.49936 

3.67 

.00047 

.49988 

2.78 

.00837 

.49728 

3.23 

.00216 

.49938 

3.68 

.00046 

.49988 

2.79 

.00814 

.49736 

3.24 

.00210 

.49940 

3.69 

.00044 

.49989 

2.80 

.00792 

.49744 

3.25 

.00203 

.49942 

3.70 

.00042 

.49989 

2.81 

.00770 

.49752 

3.26 

.00196 

.49944 

3.71 

.00041 

.49990 

2.82 

.00748 

.49760 

3.27 

.00190 

.49946 

3.72 

.00039 

.49990 

2.83 

.00727 

.49767 

3.28 

.00184 

.49948 

3.73 

.00038 

.49990 

2.84 

.00707 

.49774 

3.29 

.00178 

.49950 

3.74 

.00037 

.49991 

2.85 

.00687 

.49781 

3.30 

.00172 

.49952 

3.75 

.00035 

.49991 

2.86 

.00668 

.49788 

3.31 

.00167 

.49953 

3.76 

.00034 

.49992 

2.87 

.00649 

.49795 

3.32 

.00161 

.49955 

3.77 

.00033 

.49992 

2.88 

.00631 

.49801 

3.33 

.00156 

.49957 

3.78 

.00031 

.49992 

2.89 

.00613 

.49807 

3.34 

.00151 

.49958 

3.79 

.00030 

.49992 

2.90 

.00595 

.49813 

3.35 

.00146 

.49960 

3.80 

.00029 

.49993 

2.91 

.00578 

.49819 

3.36 

.00141 

.49961 

3.81 

.00028 

.49993 

2.92 

.00562 

.49825 

3.37 

.00136 

.49962 

3.82 1 

.00027 

.49993 

2.93 

.00545 

.49831 

3.38 

.00132 

.49964 

3.83 1 

.00026 

.49994 

2.94 

.00530 

.49836 

3.39 

.00127 

.49965 

3.84 1 

.00025 

.49994 

2.95 

.00514 

.49841 

3.40 

.00123 

.49966 

3.85 

.00024 

.49994 

2.96 

.00499 

.49846 

3.41 

.00119 

.49968 

3.86 1 

.00023 

.49994 

2.97 

.00485 

.49851 

3.42 

.00115 

.49969 

3.87 

.00022 

.49995 

2.98 

.00471 

.49856 

3.43 

.00111 

.49970 

3.88 1 

.00021 

.49995 

2.99 

.00457 

.49861 

3.44 

.00107 

.49971 

3.89 

.00021 

.49995 

3.00 

.00443 

.49865 

3.45 

.00104 

.49972 

3.90 

.00020 i 

.49995 

3.01 

.00430 

.49869 

3.46 ' 

.00100 

.49973 

3.91 

.00019 ' 

.49995 

3.02 

.00417 

.49874 

3.47 

.00097 

.49974 

3.92 

.00018 I 

.49996 

3.03 

.00405 

.49878 

3.48 ' 

.00094 

.49975 

3.93 

.00018 1 

.49996 

3.04 

.00393 

.49882 

3.49 

.00090 

.49976 

3.94 

.00017 

.49996 

3.05 

.00381 

.49886 

3.50 

.00087 

.49977 

3.95 

.00016 1 

.49996 

3.06 

.00370 

.49889 

3.51 

.00084 

.49978 

3.96 

.00016 

.49996 

3.07 

.00358 

.49893 

3.52 

.00081 

.49978 

3.97 

.00015 

.49996 

3.08 

.00348 

.49897 

3.53 

.00079 

.49979 

3.98 

.00014 

.49997 

3.09 

.00337 

.49900 

3.54 

.00076 1 

1 

.49980 

3.99 

.00014 

.49997 

3.10 

.00327 

.49903 

3.55 

.00073 

.49981 




3.11 

.00317 

.49906 

3.56 

.00071 

.49981 




3.12 

.00307 

.49910 

3.57 

.00068 1 

.49982 




3.13 

.00298 

.49913 

3.58 

.00066 

.49983 




3.14 

.00288 

.49916 

3.59 

.00063 

.49983 
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Table II. 5% (Roman Type) and 1% (Bold Face Type) Points fob the Distbibution of 
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Table III. Table op Probability Scale (from R. A. Fisher’s Table) 


Degrees 

of 

Freedom 

Chance of Exceeding Given Value of x® 

Degrees 

of 

Freedom 

.50 

.30 

.20 

.10 

.05 

.02 

.01 

n 



V alms of 

x^ 



n 

1 

.45 

1.07 

1.64 

2.71 

3.84 

5.41 

6.63 

1 

2 

1.39 

2.41 

3.22 

4.60 

5.99 

7.82 

9.21 

2 

3 

2.37 

3.66 i 

4.64 i 

6.25 

7.81 

9.84 

11.34 

3 

4 

3.36 

4.88 

5.99 

7.78 

9.49 

11.67 

13.28 

4 

5 

4.35 

6.06 

7.29 

9.24 

11.07 

13.39 

15.09 

5 

6 

5.35 

7.23 

8.56 

10.64 

12.59 

15.03 

16.81 

6 

7 

6.35 

8.38 

9.80 

12.02 

14.07 

16.62 

18.47 

7 

8 

7.34 

9.52 

11.03 

13.36 

15.51 

18.17 

20.09 

8 

9 

1 8.34 

10.66 

12.24 

14.68 

16.92 

19.68 

21.67 

9 

10 

9.34 

11.78 

13.44 

15.99 

18.31 

21.16 

23.21 

10 

11 

10.34 

12.90 

14.63 

i 17.27 

19.67 

22.62 

24.72 

11 

12 

11.34 

14.01 

15.81 

18.55 

21.03 

24.05 

26.22 

12 

13 

12.34 

15.12 

16.98 

19.81 

22.36 

25.47 

24.69 

13 

14 

13.34 

16.22 

18.15 

1 21.06 

23.68 

26.87 

29.14 

14 

15 

14.34 

17.32 

19.31 

22.31 

25.00 

28.26 

30.58 

15 

16 

15.34 

18.42 

20.46 

23.54 

26.30 

29.63 

32.00 

16 

17 

16,34 

19.51 

21.61 

24.77 

27.59 

30.99 

33.41 

17 

18 

17.34 

20.60 

22.76 

25.99 

28.87 

32.35 

34.80 

18 

19 

18.34 

21.69 

23.90 

27.20 

1 30.14 

33.69 

36.19 

19 

20 

19.34 

22,77 

25.04 

28.41 

31.41 

35.02 

37.57 

20 

! 

21 

20.34 

23.86 

26.17 

29.61 

32.67 

36.34 

38.93 

21 

' 22 

21.34 

24.94 

27.30 

30.81 

33.92 

37.66 

40.29 

22 

23 

22.34 

26.02 

28.43 

32.01 

35.17 

^ 38.97 

41.64 

23 

24 

23.34 

27.10 

29.55 

I 33.20 

1 36.41 

40.27 

42.98 

24 

25 

24.34 

28.17 

30.67 

34.38 

37.65 

41.57 

44.31 

25 

26 

25.34 

29.25 

31.79 

35.56 

38.88 

42.86 

45.64 

26 

27 

26.34 

30,32 

32.91 

1 36.74 

i 40.11 

44.14 

46.96 

27 

28 

27.34 

31.39 

34.03 

37.82 

41.34 

45.42 

48.28 

28 

29 

28.34 

32.46 

35.14 

39.09 

^ 42.56 

46.69 

49.59 

29 

30 

29.34 

33.53 

36.25 

! 40.26 

j 43.77 

47.96 

50.89 

30 


For larger values of n, V'2x® — — 1 may be referred approximately to 

normal probability scale. 
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Salvosa, 51 
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119 
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is unknown, 128 
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Universe, 
finite, 112 
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